devops_sandbox / README.md
DEVessi's picture
Upload folder using huggingface_hub
cd601a6 verified
metadata
title: Self-Healing DevOps Sandbox
emoji: 🔧
colorFrom: red
colorTo: green
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv

Self-Healing DevOps Sandbox

An OpenEnv RL environment where an AI agent is dropped into a broken Node.js backend inside a Docker container. The agent must use bash commands only to diagnose bugs, edit files, and fix the app -- just like a real DevOps engineer would.

Built for the Meta PyTorch OpenEnv Hackathon.


What Is This?

A 3-task challenge of increasing difficulty. The agent starts in a Docker container with a broken Express.js app in /app and must make all endpoints healthy.

# Difficulty Bug What's Wrong
1 Easy config.json Port set to 9999 instead of 3000
2 Medium routes/users.js Missing ) causes SyntaxError crash
3 Hard routes/data.js Missing await causes HTTP 500

Goal: Fix all bugs so these endpoints return HTTP 200:

  • GET /health returns {"status": "ok"}
  • GET /api/users returns {"users": [...]}
  • GET /api/data returns {"records": [...]}

Scoring (Partial Rewards)

The grader runs after every command and awards cumulative points:

Milestone Points Total
App starts on port 3000 +0.35 0.35
/health returns 200 +0.10 0.45
/api/users returns valid JSON +0.15 0.60
/api/data returns valid JSON +0.25 0.85
All endpoints correct +0.15 1.00

Getting Started

Prerequisites

  • Python 3.10+
  • Docker Desktop (running)
  • uv package manager (pip install uv)

1. Install Dependencies

cd devops_sandbox
uv sync

2. Build the Sandbox Docker Image

docker build -t devops-sandbox-node:latest -f simulated_app/Dockerfile simulated_app/

3. Start the Environment Server

uv run server

The server starts at http://localhost:8000.

4. Run the Baseline Agent

In a separate terminal:

# Set your OpenAI API key
export OPENAI_API_KEY="sk-..."          # Linux/Mac
$env:OPENAI_API_KEY = "sk-..."          # PowerShell

# Run the baseline
uv run python baseline.py

Test Your Own Agent

Option A: Use the Python Client

from devops_sandbox import BashAction, DevopsSandboxEnv

with DevopsSandboxEnv(base_url="http://localhost:8000").sync() as env:
    # Reset creates a fresh Docker container
    result = env.reset()
    print(result.observation.stdout)       # Task description
    print(result.observation.grader_score)  # 0.0

    # Send bash commands
    result = env.step(BashAction(command="cat /app/config.json"))
    print(result.observation.stdout)       # File contents
    print(result.observation.grader_score)  # Score after grading

    # Fix a bug
    result = env.step(BashAction(command="sed -i 's/9999/3000/' /app/config.json"))
    print(result.observation.grader_score)  # Partial score

    # Check if done
    if result.done:
        print("Episode complete!")

Option B: Use the REST API Directly

# Reset the environment
curl -X POST http://localhost:8000/reset

# Send a command
curl -X POST http://localhost:8000/step \
  -H "Content-Type: application/json" \
  -d '{"action": {"command": "ls -la /app"}}'

Option C: Use the WebSocket Endpoint

Connect to ws://localhost:8000/ws for persistent sessions.


Project Structure

devops_sandbox/
|-- openenv.yaml                 # OpenEnv manifest
|-- pyproject.toml               # Python dependencies
|-- README.md                    # This file
|-- baseline.py                  # LLM-powered baseline agent
|-- models.py                    # BashAction & TerminalObservation schemas
|-- client.py                    # Python client for the environment
|
|-- server/
|   |-- app.py                   # FastAPI server (entry point)
|   +-- devops_sandbox_environment.py  # Environment logic + grader
|
+-- simulated_app/               # The broken Node.js app (Docker context)
    |-- Dockerfile               # node:20-slim sandbox container
    |-- package.json             # Express.js project
    |-- server.js                # Main entry point
    |-- config.json              # Bug 1: wrong port
    +-- routes/
        |-- users.js             # Bug 2: syntax error
        +-- data.js              # Bug 3: missing await

How It Works

+-----------+   BashAction    +------------+   docker exec   +--------------+
|  Agent    | --------------> |  OpenEnv   | --------------> |  Docker      |
| (LLM/RL) |                 |  Server    |                 |  Container   |
|           | <-------------- |  (8000)    | <-------------- |  (broken app)|
+-----------+  Observation    +-----+------+   stdout/stderr +--------------+
               + grader_score       |
                              +-----+------+
                              |   Grader   |
                              | (curl test |
                              |  endpoints)|
                              +------------+
  1. Agent sends a BashAction (e.g., cat /app/config.json)
  2. Server runs it inside the Docker container via docker exec
  3. Grader restarts the Node app and curls all endpoints
  4. Observation returns: stdout, stderr, score (0.0-1.0), feedback

Configuration

Env Variable Default Description
OPENAI_API_KEY (required) OpenAI API key for baseline
OPENAI_MODEL gpt-4o-mini LLM model to use
OPENAI_BASE_URL (OpenAI default) Custom endpoint (Ollama, vLLM)
MAX_TURNS 30 Max steps per episode
DEVOPS_SANDBOX_URL http://localhost:8000 Environment server URL

Use with Local LLMs (Ollama, vLLM)

export OPENAI_BASE_URL="http://localhost:11434/v1"
export OPENAI_MODEL="llama3"
export OPENAI_API_KEY="dummy"
uv run python baseline.py

Validation

uv run openenv validate
# Expected: [OK] devops_sandbox: Ready for multi-mode deployment

License

BSD-style license. See LICENSE for details.