Spaces:

DEVessi
/

devops_sandbox

Runtime error

App Files Files Community

devops_sandbox / README.md

DEVessi

Upload folder using huggingface_hub

cd601a6 verified 5 days ago

preview code

raw

history blame contribute delete

6.85 kB

metadata

title: Self-Healing DevOps Sandbox
emoji: 🔧
colorFrom: red
colorTo: green
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv

Self-Healing DevOps Sandbox

An OpenEnv RL environment where an AI agent is dropped into a broken Node.js backend inside a Docker container. The agent must use bash commands only to diagnose bugs, edit files, and fix the app -- just like a real DevOps engineer would.

Built for the Meta PyTorch OpenEnv Hackathon.

What Is This?

A 3-task challenge of increasing difficulty. The agent starts in a Docker container with a broken Express.js app in /app and must make all endpoints healthy.

#	Difficulty	Bug	What's Wrong
1	Easy	`config.json`	Port set to `9999` instead of `3000`
2	Medium	`routes/users.js`	Missing `)` causes SyntaxError crash
3	Hard	`routes/data.js`	Missing `await` causes HTTP 500

Goal: Fix all bugs so these endpoints return HTTP 200:

GET /health returns {"status": "ok"}
GET /api/users returns {"users": [...]}
GET /api/data returns {"records": [...]}

Scoring (Partial Rewards)

The grader runs after every command and awards cumulative points:

Milestone	Points	Total
App starts on port 3000	+0.35	0.35
`/health` returns 200	+0.10	0.45
`/api/users` returns valid JSON	+0.15	0.60
`/api/data` returns valid JSON	+0.25	0.85
All endpoints correct	+0.15	1.00

Getting Started

Prerequisites

Python 3.10+
Docker Desktop (running)
uv package manager (pip install uv)

1. Install Dependencies

cd devops_sandbox
uv sync

2. Build the Sandbox Docker Image

docker build -t devops-sandbox-node:latest -f simulated_app/Dockerfile simulated_app/

3. Start the Environment Server

uv run server

The server starts at http://localhost:8000.

4. Run the Baseline Agent

In a separate terminal:

# Set your OpenAI API key
export OPENAI_API_KEY="sk-..."          # Linux/Mac
$env:OPENAI_API_KEY = "sk-..."          # PowerShell

# Run the baseline
uv run python baseline.py

Test Your Own Agent

Option A: Use the Python Client

from devops_sandbox import BashAction, DevopsSandboxEnv

with DevopsSandboxEnv(base_url="http://localhost:8000").sync() as env:
    # Reset creates a fresh Docker container
    result = env.reset()
    print(result.observation.stdout)       # Task description
    print(result.observation.grader_score)  # 0.0

    # Send bash commands
    result = env.step(BashAction(command="cat /app/config.json"))
    print(result.observation.stdout)       # File contents
    print(result.observation.grader_score)  # Score after grading

    # Fix a bug
    result = env.step(BashAction(command="sed -i 's/9999/3000/' /app/config.json"))
    print(result.observation.grader_score)  # Partial score

    # Check if done
    if result.done:
        print("Episode complete!")

Option B: Use the REST API Directly

# Reset the environment
curl -X POST http://localhost:8000/reset

# Send a command
curl -X POST http://localhost:8000/step \
  -H "Content-Type: application/json" \
  -d '{"action": {"command": "ls -la /app"}}'

Option C: Use the WebSocket Endpoint

Connect to ws://localhost:8000/ws for persistent sessions.

Project Structure

devops_sandbox/
|-- openenv.yaml                 # OpenEnv manifest
|-- pyproject.toml               # Python dependencies
|-- README.md                    # This file
|-- baseline.py                  # LLM-powered baseline agent
|-- models.py                    # BashAction & TerminalObservation schemas
|-- client.py                    # Python client for the environment
|
|-- server/
|   |-- app.py                   # FastAPI server (entry point)
|   +-- devops_sandbox_environment.py  # Environment logic + grader
|
+-- simulated_app/               # The broken Node.js app (Docker context)
    |-- Dockerfile               # node:20-slim sandbox container
    |-- package.json             # Express.js project
    |-- server.js                # Main entry point
    |-- config.json              # Bug 1: wrong port
    +-- routes/
        |-- users.js             # Bug 2: syntax error
        +-- data.js              # Bug 3: missing await

How It Works

+-----------+   BashAction    +------------+   docker exec   +--------------+
|  Agent    | --------------> |  OpenEnv   | --------------> |  Docker      |
| (LLM/RL) |                 |  Server    |                 |  Container   |
|           | <-------------- |  (8000)    | <-------------- |  (broken app)|
+-----------+  Observation    +-----+------+   stdout/stderr +--------------+
               + grader_score       |
                              +-----+------+
                              |   Grader   |
                              | (curl test |
                              |  endpoints)|
                              +------------+

Agent sends a BashAction (e.g., cat /app/config.json)
Server runs it inside the Docker container via docker exec
Grader restarts the Node app and curls all endpoints
Observation returns: stdout, stderr, score (0.0-1.0), feedback

Configuration

Env Variable	Default	Description
`OPENAI_API_KEY`	(required)	OpenAI API key for baseline
`OPENAI_MODEL`	`gpt-4o-mini`	LLM model to use
`OPENAI_BASE_URL`	(OpenAI default)	Custom endpoint (Ollama, vLLM)
`MAX_TURNS`	`30`	Max steps per episode
`DEVOPS_SANDBOX_URL`	`http://localhost:8000`	Environment server URL

Use with Local LLMs (Ollama, vLLM)

export OPENAI_BASE_URL="http://localhost:11434/v1"
export OPENAI_MODEL="llama3"
export OPENAI_API_KEY="dummy"
uv run python baseline.py

Validation

uv run openenv validate
# Expected: [OK] devops_sandbox: Ready for multi-mode deployment

License

BSD-style license. See LICENSE for details.