DEVessi commited on
Commit
cd601a6
·
verified ·
1 Parent(s): 05b0719

Upload folder using huggingface_hub

Browse files
Dockerfile ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ # Multi-stage build using openenv-base
8
+ # This Dockerfile is flexible and works for both:
9
+ # - In-repo environments (with local OpenEnv sources)
10
+ # - Standalone environments (with openenv from PyPI/Git)
11
+ # The build script (openenv build) handles context detection and sets appropriate build args.
12
+
13
+ ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
14
+ FROM ${BASE_IMAGE} AS builder
15
+
16
+ WORKDIR /app
17
+
18
+ # Ensure git is available (required for installing dependencies from VCS)
19
+ RUN apt-get update && \
20
+ apt-get install -y --no-install-recommends git && \
21
+ rm -rf /var/lib/apt/lists/*
22
+
23
+ # Build argument to control whether we're building standalone or in-repo
24
+ ARG BUILD_MODE=in-repo
25
+ ARG ENV_NAME=devops_sandbox
26
+
27
+ # Copy environment code (always at root of build context)
28
+ COPY . /app/env
29
+
30
+ # For in-repo builds, openenv is already vendored in the build context
31
+ # For standalone builds, openenv will be installed via pyproject.toml
32
+ WORKDIR /app/env
33
+
34
+ # Ensure uv is available (for local builds where base image lacks it)
35
+ RUN if ! command -v uv >/dev/null 2>&1; then \
36
+ curl -LsSf https://astral.sh/uv/install.sh | sh && \
37
+ mv /root/.local/bin/uv /usr/local/bin/uv && \
38
+ mv /root/.local/bin/uvx /usr/local/bin/uvx; \
39
+ fi
40
+
41
+ # Install dependencies using uv sync
42
+ # If uv.lock exists, use it; otherwise resolve on the fly
43
+ RUN --mount=type=cache,target=/root/.cache/uv \
44
+ if [ -f uv.lock ]; then \
45
+ uv sync --frozen --no-install-project --no-editable; \
46
+ else \
47
+ uv sync --no-install-project --no-editable; \
48
+ fi
49
+
50
+ RUN --mount=type=cache,target=/root/.cache/uv \
51
+ if [ -f uv.lock ]; then \
52
+ uv sync --frozen --no-editable; \
53
+ else \
54
+ uv sync --no-editable; \
55
+ fi
56
+
57
+ # Final runtime stage
58
+ FROM ${BASE_IMAGE}
59
+
60
+ WORKDIR /app
61
+
62
+ # Copy the virtual environment from builder
63
+ COPY --from=builder /app/env/.venv /app/.venv
64
+
65
+ # Copy the environment code
66
+ COPY --from=builder /app/env /app/env
67
+
68
+ # Set PATH to use the virtual environment
69
+ ENV PATH="/app/.venv/bin:$PATH"
70
+
71
+ # Set PYTHONPATH so imports work correctly
72
+ ENV PYTHONPATH="/app/env:$PYTHONPATH"
73
+
74
+ # Health check
75
+ HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
76
+ CMD curl -f http://localhost:8000/health || exit 1
77
+
78
+ # Run the FastAPI server
79
+ # The module path is constructed to work with the /app/env structure
80
+ ENV ENABLE_WEB_INTERFACE=true
81
+ CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]
README.md CHANGED
@@ -1,10 +1,224 @@
1
- ---
2
- title: Devops Sandbox
3
- emoji: 😻
4
- colorFrom: pink
5
- colorTo: blue
6
- sdk: docker
7
- pinned: false
8
- ---
9
-
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Self-Healing DevOps Sandbox
3
+ emoji: 🔧
4
+ colorFrom: red
5
+ colorTo: green
6
+ sdk: docker
7
+ pinned: false
8
+ app_port: 8000
9
+ base_path: /web
10
+ tags:
11
+ - openenv
12
+ ---
13
+
14
+ # Self-Healing DevOps Sandbox
15
+
16
+ An OpenEnv RL environment where an AI agent is dropped into a **broken Node.js backend** inside a Docker container. The agent must use **bash commands only** to diagnose bugs, edit files, and fix the app -- just like a real DevOps engineer would.
17
+
18
+ Built for the **Meta PyTorch OpenEnv Hackathon**.
19
+
20
+ ---
21
+
22
+ ## What Is This?
23
+
24
+ A 3-task challenge of increasing difficulty. The agent starts in a Docker container with a broken Express.js app in `/app` and must make all endpoints healthy.
25
+
26
+ | # | Difficulty | Bug | What's Wrong |
27
+ |---|-----------|-----------------|---------------------------------------|
28
+ | 1 | Easy | `config.json` | Port set to `9999` instead of `3000` |
29
+ | 2 | Medium | `routes/users.js`| Missing `)` causes SyntaxError crash |
30
+ | 3 | Hard | `routes/data.js` | Missing `await` causes HTTP 500 |
31
+
32
+ **Goal:** Fix all bugs so these endpoints return HTTP 200:
33
+ - `GET /health` returns `{"status": "ok"}`
34
+ - `GET /api/users` returns `{"users": [...]}`
35
+ - `GET /api/data` returns `{"records": [...]}`
36
+
37
+ ---
38
+
39
+ ## Scoring (Partial Rewards)
40
+
41
+ The grader runs **after every command** and awards cumulative points:
42
+
43
+ | Milestone | Points | Total |
44
+ |----------------------------------|--------|----------|
45
+ | App starts on port 3000 | +0.35 | 0.35 |
46
+ | `/health` returns 200 | +0.10 | 0.45 |
47
+ | `/api/users` returns valid JSON | +0.15 | 0.60 |
48
+ | `/api/data` returns valid JSON | +0.25 | 0.85 |
49
+ | All endpoints correct | +0.15 | **1.00** |
50
+
51
+ ---
52
+
53
+ ## Getting Started
54
+
55
+ ### Prerequisites
56
+
57
+ - **Python 3.10+**
58
+ - **Docker Desktop** (running)
59
+ - **uv** package manager (`pip install uv`)
60
+
61
+ ### 1. Install Dependencies
62
+
63
+ ```bash
64
+ cd devops_sandbox
65
+ uv sync
66
+ ```
67
+
68
+ ### 2. Build the Sandbox Docker Image
69
+
70
+ ```bash
71
+ docker build -t devops-sandbox-node:latest -f simulated_app/Dockerfile simulated_app/
72
+ ```
73
+
74
+ ### 3. Start the Environment Server
75
+
76
+ ```bash
77
+ uv run server
78
+ ```
79
+
80
+ The server starts at `http://localhost:8000`.
81
+
82
+ ### 4. Run the Baseline Agent
83
+
84
+ In a **separate terminal**:
85
+
86
+ ```bash
87
+ # Set your OpenAI API key
88
+ export OPENAI_API_KEY="sk-..." # Linux/Mac
89
+ $env:OPENAI_API_KEY = "sk-..." # PowerShell
90
+
91
+ # Run the baseline
92
+ uv run python baseline.py
93
+ ```
94
+
95
+ ---
96
+
97
+ ## Test Your Own Agent
98
+
99
+ ### Option A: Use the Python Client
100
+
101
+ ```python
102
+ from devops_sandbox import BashAction, DevopsSandboxEnv
103
+
104
+ with DevopsSandboxEnv(base_url="http://localhost:8000").sync() as env:
105
+ # Reset creates a fresh Docker container
106
+ result = env.reset()
107
+ print(result.observation.stdout) # Task description
108
+ print(result.observation.grader_score) # 0.0
109
+
110
+ # Send bash commands
111
+ result = env.step(BashAction(command="cat /app/config.json"))
112
+ print(result.observation.stdout) # File contents
113
+ print(result.observation.grader_score) # Score after grading
114
+
115
+ # Fix a bug
116
+ result = env.step(BashAction(command="sed -i 's/9999/3000/' /app/config.json"))
117
+ print(result.observation.grader_score) # Partial score
118
+
119
+ # Check if done
120
+ if result.done:
121
+ print("Episode complete!")
122
+ ```
123
+
124
+ ### Option B: Use the REST API Directly
125
+
126
+ ```bash
127
+ # Reset the environment
128
+ curl -X POST http://localhost:8000/reset
129
+
130
+ # Send a command
131
+ curl -X POST http://localhost:8000/step \
132
+ -H "Content-Type: application/json" \
133
+ -d '{"action": {"command": "ls -la /app"}}'
134
+ ```
135
+
136
+ ### Option C: Use the WebSocket Endpoint
137
+
138
+ Connect to `ws://localhost:8000/ws` for persistent sessions.
139
+
140
+ ---
141
+
142
+ ## Project Structure
143
+
144
+ ```
145
+ devops_sandbox/
146
+ |-- openenv.yaml # OpenEnv manifest
147
+ |-- pyproject.toml # Python dependencies
148
+ |-- README.md # This file
149
+ |-- baseline.py # LLM-powered baseline agent
150
+ |-- models.py # BashAction & TerminalObservation schemas
151
+ |-- client.py # Python client for the environment
152
+ |
153
+ |-- server/
154
+ | |-- app.py # FastAPI server (entry point)
155
+ | +-- devops_sandbox_environment.py # Environment logic + grader
156
+ |
157
+ +-- simulated_app/ # The broken Node.js app (Docker context)
158
+ |-- Dockerfile # node:20-slim sandbox container
159
+ |-- package.json # Express.js project
160
+ |-- server.js # Main entry point
161
+ |-- config.json # Bug 1: wrong port
162
+ +-- routes/
163
+ |-- users.js # Bug 2: syntax error
164
+ +-- data.js # Bug 3: missing await
165
+ ```
166
+
167
+ ---
168
+
169
+ ## How It Works
170
+
171
+ ```
172
+ +-----------+ BashAction +------------+ docker exec +--------------+
173
+ | Agent | --------------> | OpenEnv | --------------> | Docker |
174
+ | (LLM/RL) | | Server | | Container |
175
+ | | <-------------- | (8000) | <-------------- | (broken app)|
176
+ +-----------+ Observation +-----+------+ stdout/stderr +--------------+
177
+ + grader_score |
178
+ +-----+------+
179
+ | Grader |
180
+ | (curl test |
181
+ | endpoints)|
182
+ +------------+
183
+ ```
184
+
185
+ 1. **Agent** sends a `BashAction` (e.g., `cat /app/config.json`)
186
+ 2. **Server** runs it inside the Docker container via `docker exec`
187
+ 3. **Grader** restarts the Node app and curls all endpoints
188
+ 4. **Observation** returns: stdout, stderr, score (0.0-1.0), feedback
189
+
190
+ ---
191
+
192
+ ## Configuration
193
+
194
+ | Env Variable | Default | Description |
195
+ |--------------------|--------------------------|------------------------------------|
196
+ | `OPENAI_API_KEY` | *(required)* | OpenAI API key for baseline |
197
+ | `OPENAI_MODEL` | `gpt-4o-mini` | LLM model to use |
198
+ | `OPENAI_BASE_URL` | *(OpenAI default)* | Custom endpoint (Ollama, vLLM) |
199
+ | `MAX_TURNS` | `30` | Max steps per episode |
200
+ | `DEVOPS_SANDBOX_URL`| `http://localhost:8000` | Environment server URL |
201
+
202
+ ### Use with Local LLMs (Ollama, vLLM)
203
+
204
+ ```bash
205
+ export OPENAI_BASE_URL="http://localhost:11434/v1"
206
+ export OPENAI_MODEL="llama3"
207
+ export OPENAI_API_KEY="dummy"
208
+ uv run python baseline.py
209
+ ```
210
+
211
+ ---
212
+
213
+ ## Validation
214
+
215
+ ```bash
216
+ uv run openenv validate
217
+ # Expected: [OK] devops_sandbox: Ready for multi-mode deployment
218
+ ```
219
+
220
+ ---
221
+
222
+ ## License
223
+
224
+ BSD-style license. See LICENSE for details.
__init__.py ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Self-Healing DevOps Sandbox Environment."""
8
+
9
+ from .client import DevopsSandboxEnv
10
+ from .models import BashAction, TerminalObservation
11
+
12
+ __all__ = [
13
+ "BashAction",
14
+ "TerminalObservation",
15
+ "DevopsSandboxEnv",
16
+ ]
_cli_path.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ E:\programs2\openenv(RL)\devops_sandbox\.venv\Lib\site-packages\openenv\cli
_path.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ E:\programs2\openenv(RL)\devops_sandbox\.venv\Lib\site-packages\openenv
baseline.py ADDED
@@ -0,0 +1,203 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
3
+ # All rights reserved.
4
+ #
5
+ # This source code is licensed under the BSD-style license found in the
6
+ # LICENSE file in the root directory of this source tree.
7
+
8
+ """
9
+ Baseline inference script for the Self-Healing DevOps Sandbox.
10
+
11
+ Uses an LLM (via the OpenAI-compatible API) to diagnose and fix a broken
12
+ Node.js backend running inside a Docker container.
13
+
14
+ Usage:
15
+ export OPENAI_API_KEY="sk-..."
16
+ python baseline.py
17
+
18
+ # Or with a custom endpoint (e.g., local vLLM):
19
+ export OPENAI_BASE_URL="http://localhost:8080/v1"
20
+ python baseline.py
21
+ """
22
+
23
+ import json
24
+ import os
25
+ import sys
26
+
27
+ try:
28
+ from openai import OpenAI
29
+ except ImportError:
30
+ print("ERROR: 'openai' package is required. Install with: pip install openai")
31
+ sys.exit(1)
32
+
33
+ from devops_sandbox import BashAction, DevopsSandboxEnv
34
+
35
+ # ---------------------------------------------------------------------------
36
+ # Configuration
37
+ # ---------------------------------------------------------------------------
38
+ ENV_URL = os.getenv("DEVOPS_SANDBOX_URL", "http://localhost:8000")
39
+ MODEL = os.getenv("OPENAI_MODEL", "gpt-4o-mini")
40
+ MAX_TURNS = int(os.getenv("MAX_TURNS", "30"))
41
+
42
+ SYSTEM_PROMPT = """\
43
+ You are an expert DevOps engineer and Node.js developer.
44
+
45
+ You have been dropped into a Linux container with a broken Express.js backend in /app.
46
+ Your goal is to diagnose and fix ALL bugs so the app runs correctly.
47
+
48
+ RULES:
49
+ 1. Respond ONLY with a JSON object: {"command": "<bash command>"}
50
+ 2. Use standard bash/Linux commands (ls, cat, grep, sed, node, npm, etc.)
51
+ 3. Do NOT use interactive editors (vi, nano). Use sed or echo/cat with redirection.
52
+ 4. After fixing bugs, restart the app with: cd /app && npm start &
53
+ 5. Be methodical: read files first, understand the bug, then fix it.
54
+
55
+ EXPECTED FINAL STATE:
56
+ - App starts without errors on port 3000
57
+ - GET /health → 200
58
+ - GET /api/users → 200 with JSON containing "users" array
59
+ - GET /api/data → 200 with JSON containing "records" array
60
+ """
61
+
62
+
63
+ def extract_command(llm_response: str) -> str:
64
+ """Extract a bash command from the LLM's response (JSON or raw text)."""
65
+ # Try JSON parsing first
66
+ try:
67
+ data = json.loads(llm_response.strip())
68
+ if isinstance(data, dict) and "command" in data:
69
+ return data["command"]
70
+ except (json.JSONDecodeError, TypeError):
71
+ pass
72
+
73
+ # Try extracting from markdown code block
74
+ if "```" in llm_response:
75
+ lines = llm_response.split("```")
76
+ for block in lines[1::2]: # odd indices are code blocks
77
+ code = block.strip()
78
+ if code.startswith("json"):
79
+ code = code[4:].strip()
80
+ try:
81
+ data = json.loads(code)
82
+ if isinstance(data, dict) and "command" in data:
83
+ return data["command"]
84
+ except (json.JSONDecodeError, TypeError):
85
+ pass
86
+ elif code.startswith("bash") or code.startswith("sh"):
87
+ code = code.split("\n", 1)[-1].strip()
88
+ return code
89
+ else:
90
+ first_line = code.split("\n")[0].strip()
91
+ if first_line:
92
+ return first_line
93
+
94
+ # Fallback: treat entire response as a command
95
+ cmd = llm_response.strip().strip("`").strip()
96
+ if cmd.startswith("{"):
97
+ # One more try
98
+ try:
99
+ return json.loads(cmd)["command"]
100
+ except Exception:
101
+ pass
102
+ return cmd
103
+
104
+
105
+ def main():
106
+ print("=" * 60)
107
+ print(" Self-Healing DevOps Sandbox — Baseline Agent")
108
+ print("=" * 60)
109
+
110
+ client = OpenAI()
111
+
112
+ messages = [{"role": "system", "content": SYSTEM_PROMPT}]
113
+
114
+ with DevopsSandboxEnv(base_url=ENV_URL).sync() as env:
115
+ # Reset the environment
116
+ print("\n[*] Resetting environment...")
117
+ result = env.reset()
118
+ obs = result.observation
119
+
120
+ print(f"\n[INIT] Task prompt:\n{obs.stdout[:500]}...")
121
+ print(f"[INIT] Score: {obs.grader_score} | Feedback: {obs.grader_feedback}")
122
+
123
+ # Add initial observation to messages
124
+ messages.append({
125
+ "role": "user",
126
+ "content": (
127
+ f"Here is the initial state of the broken app:\n\n"
128
+ f"```\n{obs.stdout}\n```\n\n"
129
+ f"Current directory: {obs.current_dir}\n"
130
+ f"Score: {obs.grader_score}/1.0\n\n"
131
+ f"What bash command should I run first?"
132
+ ),
133
+ })
134
+
135
+ for turn in range(1, MAX_TURNS + 1):
136
+ print(f"\n{'─' * 40}")
137
+ print(f"Turn {turn}/{MAX_TURNS}")
138
+ print(f"{'─' * 40}")
139
+
140
+ # Get LLM response
141
+ try:
142
+ response = client.chat.completions.create(
143
+ model=MODEL,
144
+ messages=messages,
145
+ temperature=0.2,
146
+ max_tokens=256,
147
+ )
148
+ llm_text = response.choices[0].message.content or ""
149
+ except Exception as e:
150
+ print(f"[ERROR] LLM call failed: {e}")
151
+ break
152
+
153
+ # Extract command
154
+ command = extract_command(llm_text)
155
+ if not command:
156
+ print("[WARN] Could not extract command from LLM response")
157
+ command = "ls -la /app"
158
+
159
+ print(f"[CMD] {command}")
160
+
161
+ # Execute in environment
162
+ result = env.step(BashAction(command=command))
163
+ obs = result.observation
164
+
165
+ stdout_preview = obs.stdout[:300] if obs.stdout else "(empty)"
166
+ stderr_preview = obs.stderr[:200] if obs.stderr else "(none)"
167
+ print(f"[OUT] {stdout_preview}")
168
+ if obs.stderr:
169
+ print(f"[ERR] {stderr_preview}")
170
+ print(f"[SCORE] {obs.grader_score:.2f} | {obs.grader_feedback}")
171
+
172
+ # Add to conversation
173
+ messages.append({"role": "assistant", "content": llm_text})
174
+ messages.append({
175
+ "role": "user",
176
+ "content": (
177
+ f"Command output:\n"
178
+ f"stdout:\n```\n{obs.stdout}\n```\n"
179
+ f"stderr:\n```\n{obs.stderr}\n```\n"
180
+ f"Current score: {obs.grader_score}/1.0\n"
181
+ f"Grader feedback: {obs.grader_feedback}\n\n"
182
+ f"What command should I run next?"
183
+ ),
184
+ })
185
+
186
+ # Check if done
187
+ if result.done:
188
+ print(f"\n{'=' * 60}")
189
+ if obs.grader_score >= 1.0:
190
+ print(" ✅ ALL BUGS FIXED — PERFECT SCORE!")
191
+ else:
192
+ print(f" Episode ended. Final score: {obs.grader_score:.2f}/1.0")
193
+ print(f"{'=' * 60}")
194
+ break
195
+ else:
196
+ print(f"\n[!] Max turns ({MAX_TURNS}) reached.")
197
+ print(f" Final score: {obs.grader_score:.2f}/1.0")
198
+
199
+ print("\n[*] Done.")
200
+
201
+
202
+ if __name__ == "__main__":
203
+ main()
client.py ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Self-Healing DevOps Sandbox Environment Client."""
8
+
9
+ from typing import Dict
10
+
11
+ from openenv.core import EnvClient
12
+ from openenv.core.client_types import StepResult
13
+ from openenv.core.env_server.types import State
14
+
15
+ from .models import BashAction, TerminalObservation
16
+
17
+
18
+ class DevopsSandboxEnv(
19
+ EnvClient[BashAction, TerminalObservation, State]
20
+ ):
21
+ """
22
+ Client for the Self-Healing DevOps Sandbox Environment.
23
+
24
+ Example:
25
+ >>> with DevopsSandboxEnv(base_url="http://localhost:8000") as client:
26
+ ... result = client.reset()
27
+ ... print(result.observation.stdout)
28
+ ...
29
+ ... result = client.step(BashAction(command="ls -la"))
30
+ ... print(result.observation.stdout)
31
+ """
32
+
33
+ def _step_payload(self, action: BashAction) -> Dict:
34
+ """Convert BashAction to JSON payload for step message."""
35
+ return {
36
+ "command": action.command,
37
+ }
38
+
39
+ def _parse_result(self, payload: Dict) -> StepResult[TerminalObservation]:
40
+ """Parse server response into StepResult[TerminalObservation]."""
41
+ obs_data = payload.get("observation", {})
42
+ observation = TerminalObservation(
43
+ stdout=obs_data.get("stdout", ""),
44
+ stderr=obs_data.get("stderr", ""),
45
+ current_dir=obs_data.get("current_dir", "/app"),
46
+ task_id=obs_data.get("task_id", "devops_sandbox"),
47
+ grader_score=obs_data.get("grader_score", 0.0),
48
+ grader_feedback=obs_data.get("grader_feedback", ""),
49
+ done=payload.get("done", False),
50
+ reward=payload.get("reward"),
51
+ metadata=obs_data.get("metadata", {}),
52
+ )
53
+
54
+ return StepResult(
55
+ observation=observation,
56
+ reward=payload.get("reward"),
57
+ done=payload.get("done", False),
58
+ )
59
+
60
+ def _parse_state(self, payload: Dict) -> State:
61
+ """Parse server response into State object."""
62
+ return State(
63
+ episode_id=payload.get("episode_id"),
64
+ step_count=payload.get("step_count", 0),
65
+ )
devops_troubleshooting_sandbox.egg-info/PKG-INFO ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ Metadata-Version: 2.4
2
+ Name: devops-troubleshooting-sandbox
3
+ Version: 0.1.0
4
+ Summary: A real-world DevOps environment where an agent must diagnose and fix broken backend configurations inside an isolated Docker container.
5
+ Requires-Python: >=3.10
6
+ Requires-Dist: openenv-core[core]>=0.2.1
7
+ Requires-Dist: openai>=1.0.0
devops_troubleshooting_sandbox.egg-info/SOURCES.txt ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ README.md
2
+ __init__.py
3
+ client.py
4
+ models.py
5
+ pyproject.toml
6
+ ./__init__.py
7
+ ./baseline.py
8
+ ./client.py
9
+ ./models.py
10
+ devops_troubleshooting_sandbox.egg-info/PKG-INFO
11
+ devops_troubleshooting_sandbox.egg-info/SOURCES.txt
12
+ devops_troubleshooting_sandbox.egg-info/dependency_links.txt
13
+ devops_troubleshooting_sandbox.egg-info/entry_points.txt
14
+ devops_troubleshooting_sandbox.egg-info/requires.txt
15
+ devops_troubleshooting_sandbox.egg-info/top_level.txt
16
+ server/__init__.py
17
+ server/app.py
18
+ server/devops_sandbox_environment.py
devops_troubleshooting_sandbox.egg-info/dependency_links.txt ADDED
@@ -0,0 +1 @@
 
 
1
+
devops_troubleshooting_sandbox.egg-info/entry_points.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ [console_scripts]
2
+ server = devops_sandbox.server.app:main
devops_troubleshooting_sandbox.egg-info/requires.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ openenv-core[core]>=0.2.1
2
+ openai>=1.0.0
devops_troubleshooting_sandbox.egg-info/top_level.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ devops_sandbox
models.py ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """
8
+ Data models for the Self-Healing DevOps Sandbox Environment.
9
+
10
+ Defines the Action and Observation types used by the RL agent to interact
11
+ with a broken Node.js backend running inside a Docker container.
12
+ """
13
+
14
+ from typing import Any, Dict
15
+
16
+ from pydantic import Field
17
+
18
+ from openenv.core.env_server.types import Action, Observation
19
+
20
+
21
+ class BashAction(Action):
22
+ """Action: a bash command to execute inside the Docker sandbox.
23
+
24
+ The agent sends shell commands (ls, cat, sed, node, etc.) to diagnose
25
+ and repair the broken Node.js application.
26
+ """
27
+
28
+ command: str = Field(
29
+ ...,
30
+ description=(
31
+ "The bash command to execute in the sandbox terminal "
32
+ "(e.g., 'ls -la', 'cat server.js', "
33
+ "'sed -i s/old/new/ file.js')."
34
+ ),
35
+ )
36
+
37
+
38
+ class TerminalObservation(Observation):
39
+ """Observation returned after executing a bash command.
40
+
41
+ Includes stdout/stderr from the command, working directory context,
42
+ the current task identifier, and the grader's partial score.
43
+ """
44
+
45
+ stdout: str = Field(
46
+ default="",
47
+ description="Standard output from the executed command.",
48
+ )
49
+ stderr: str = Field(
50
+ default="",
51
+ description="Standard error from the executed command, if any.",
52
+ )
53
+ current_dir: str = Field(
54
+ default="/app",
55
+ description="The current working directory inside the container.",
56
+ )
57
+ task_id: str = Field(
58
+ default="devops_sandbox",
59
+ description="Identifier for the current task scenario.",
60
+ )
61
+ grader_score: float = Field(
62
+ default=0.0,
63
+ ge=0.0,
64
+ le=1.0,
65
+ description="The grader's partial reward (0.0 to 1.0).",
66
+ )
67
+ grader_feedback: str = Field(
68
+ default="",
69
+ description="Human-readable feedback from the grader.",
70
+ )
openenv.yaml ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ spec_version: 1
2
+ name: devops_sandbox
3
+ type: space
4
+ runtime: fastapi
5
+ app: server.app:app
6
+ port: 8000
7
+
openenv_devops_sandbox.egg-info/PKG-INFO ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ Metadata-Version: 2.4
2
+ Name: openenv-devops_sandbox
3
+ Version: 0.1.0
4
+ Summary: Devops Sandbox environment for OpenEnv
5
+ Requires-Python: >=3.10
6
+ Requires-Dist: openenv-core[core]>=0.2.1
7
+ Provides-Extra: dev
8
+ Requires-Dist: pytest>=8.0.0; extra == "dev"
9
+ Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
openenv_devops_sandbox.egg-info/SOURCES.txt ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ README.md
2
+ pyproject.toml
3
+ ./__init__.py
4
+ ./client.py
5
+ ./models.py
6
+ openenv_devops_sandbox.egg-info/PKG-INFO
7
+ openenv_devops_sandbox.egg-info/SOURCES.txt
8
+ openenv_devops_sandbox.egg-info/dependency_links.txt
9
+ openenv_devops_sandbox.egg-info/entry_points.txt
10
+ openenv_devops_sandbox.egg-info/requires.txt
11
+ openenv_devops_sandbox.egg-info/top_level.txt
12
+ server/__init__.py
13
+ server/app.py
14
+ server/devops_sandbox_environment.py
openenv_devops_sandbox.egg-info/dependency_links.txt ADDED
@@ -0,0 +1 @@
 
 
1
+
openenv_devops_sandbox.egg-info/entry_points.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ [console_scripts]
2
+ server = devops_sandbox.server.app:main
openenv_devops_sandbox.egg-info/requires.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ openenv-core[core]>=0.2.1
2
+
3
+ [dev]
4
+ pytest>=8.0.0
5
+ pytest-cov>=4.0.0
openenv_devops_sandbox.egg-info/top_level.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ devops_sandbox
pyproject.toml ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [build-system]
2
+ requires = ["setuptools>=45", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "devops-troubleshooting-sandbox"
7
+ version = "0.1.0"
8
+ description = "A real-world DevOps environment where an agent must diagnose and fix broken backend configurations inside an isolated Docker container."
9
+ requires-python = ">=3.10"
10
+ dependencies = [
11
+ "openenv-core[core]>=0.2.1",
12
+ "openai>=1.0.0",
13
+ ]
14
+
15
+ [project.scripts]
16
+ # This line fixes the "Missing [project.scripts]" error
17
+ server = "devops_sandbox.server.app:main"
18
+
19
+ [tool.setuptools]
20
+ include-package-data = true
21
+ packages = ["devops_sandbox", "devops_sandbox.server"]
22
+ package-dir = { "devops_sandbox" = ".", "devops_sandbox.server" = "server" }
server/__init__.py ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Self-Healing DevOps Sandbox environment server components."""
8
+
9
+ from .devops_sandbox_environment import DevOpsSandbox
10
+
11
+ __all__ = ["DevOpsSandbox"]
server/app.py ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """
8
+ FastAPI application for the Self-Healing DevOps Sandbox Environment.
9
+
10
+ Endpoints:
11
+ - POST /reset: Reset the environment (build & start container)
12
+ - POST /step: Execute a bash command inside the container
13
+ - GET /state: Get current environment state
14
+ - GET /schema: Get action/observation schemas
15
+ - WS /ws: WebSocket endpoint for persistent sessions
16
+
17
+ Usage:
18
+ uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
19
+ """
20
+
21
+ try:
22
+ from openenv.core.env_server.http_server import create_app
23
+ except Exception as e: # pragma: no cover
24
+ raise ImportError(
25
+ "openenv is required. Install with:\n uv sync\n"
26
+ ) from e
27
+
28
+ try:
29
+ from ..models import BashAction, TerminalObservation
30
+ from .devops_sandbox_environment import DevOpsSandbox
31
+ except (ImportError, ModuleNotFoundError):
32
+ from models import BashAction, TerminalObservation
33
+ from server.devops_sandbox_environment import DevOpsSandbox
34
+
35
+
36
+ # Create the app — DevOpsSandbox is passed as a class (factory mode)
37
+ app = create_app(
38
+ DevOpsSandbox,
39
+ BashAction,
40
+ TerminalObservation,
41
+ env_name="devops_sandbox",
42
+ max_concurrent_envs=1,
43
+ )
44
+
45
+
46
+ def main(host: str = "0.0.0.0", port: int = 8000):
47
+ """
48
+ Entry point for direct execution.
49
+
50
+ uv run --project . server
51
+ python -m devops_sandbox.server.app
52
+ """
53
+ import uvicorn
54
+
55
+ uvicorn.run(app, host=host, port=port)
56
+
57
+
58
+ if __name__ == "__main__":
59
+ main()
server/devops_sandbox_environment.py ADDED
@@ -0,0 +1,450 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """
8
+ Self-Healing DevOps Sandbox — Environment Implementation.
9
+
10
+ Spins up an isolated Docker container with a broken Node.js backend.
11
+ The RL agent executes bash commands to diagnose and fix 3 bugs.
12
+ A programmatic grader awards partial credit (0.0 → 1.0) after every step.
13
+ """
14
+
15
+ import logging
16
+ import os
17
+ import subprocess
18
+ import time
19
+ from pathlib import Path
20
+ from typing import Any, Optional
21
+ from uuid import uuid4
22
+
23
+ from openenv.core.env_server.interfaces import Environment
24
+ from openenv.core.env_server.types import State
25
+
26
+ try:
27
+ from ..models import BashAction, TerminalObservation
28
+ except ImportError:
29
+ from models import BashAction, TerminalObservation
30
+
31
+ logger = logging.getLogger(__name__)
32
+
33
+ # ---------------------------------------------------------------------------
34
+ # Constants
35
+ # ---------------------------------------------------------------------------
36
+ CONTAINER_NAME_PREFIX = "devops_sandbox_"
37
+ IMAGE_NAME = "devops-sandbox-node:latest"
38
+ EXPECTED_PORT = 3000 # The port the fixed app should listen on
39
+ MAX_STEPS = 50 # Episode budget
40
+ SIMULATED_APP_DIR = Path(__file__).resolve().parent.parent / "simulated_app"
41
+
42
+
43
+ class DevOpsSandbox(Environment):
44
+ """
45
+ RL environment: fix a broken Node.js backend inside a Docker container.
46
+
47
+ reset() → build image (if needed) + start container + return initial obs
48
+ step() → docker exec the agent's command + run grader → obs + reward
49
+ close() → tear down container
50
+ """
51
+
52
+ SUPPORTS_CONCURRENT_SESSIONS: bool = False
53
+
54
+ # ------------------------------------------------------------------
55
+ # Lifecycle
56
+ # ------------------------------------------------------------------
57
+ def __init__(self):
58
+ super().__init__()
59
+ self._state = State(episode_id=str(uuid4()), step_count=0)
60
+ self._container_name: Optional[str] = None
61
+ self._container_running: bool = False
62
+ self._current_dir: str = "/app"
63
+ self._last_score: float = 0.0
64
+
65
+ # ------------------------------------------------------------------
66
+ # reset
67
+ # ------------------------------------------------------------------
68
+ def reset(
69
+ self,
70
+ seed: Optional[int] = None,
71
+ episode_id: Optional[str] = None,
72
+ **kwargs: Any,
73
+ ) -> TerminalObservation:
74
+ """Build the Docker image, start the container, return the task prompt."""
75
+ # Cleanup previous episode
76
+ self._cleanup_container()
77
+
78
+ # New episode
79
+ eid = episode_id or str(uuid4())
80
+ self._state = State(episode_id=eid, step_count=0)
81
+ self._last_score = 0.0
82
+ self._current_dir = "/app"
83
+
84
+ # Build image (idempotent — Docker caches layers)
85
+ self._build_image()
86
+
87
+ # Start container
88
+ self._container_name = f"{CONTAINER_NAME_PREFIX}{eid[:8]}"
89
+ self._start_container()
90
+
91
+ # Inject the grader script into the container
92
+ self._inject_grader_script()
93
+
94
+ # Gather initial observation
95
+ init_stdout = self._docker_exec("ls -la /app && echo '---' && cat /app/config.json")
96
+
97
+ task_prompt = (
98
+ "=== SELF-HEALING DEVOPS SANDBOX ===\n"
99
+ "You have been dropped into a Docker container with a broken Node.js "
100
+ "Express backend in /app.\n\n"
101
+ "YOUR MISSION: Diagnose and fix ALL bugs so that:\n"
102
+ " 1. The app starts without errors on port 3000\n"
103
+ " 2. GET /health returns HTTP 200\n"
104
+ " 3. GET /api/users returns HTTP 200 with valid JSON\n"
105
+ " 4. GET /api/data returns HTTP 200 with valid JSON\n\n"
106
+ "HINTS:\n"
107
+ " - Check config files for wrong settings\n"
108
+ " - Look for syntax errors that prevent startup\n"
109
+ " - Watch out for async/await issues\n\n"
110
+ "Use bash commands to explore, edit files, and test.\n"
111
+ "When you think you've fixed everything, run: npm start\n\n"
112
+ "--- INITIAL DIRECTORY LISTING ---\n"
113
+ f"{init_stdout}\n"
114
+ )
115
+
116
+ return TerminalObservation(
117
+ stdout=task_prompt,
118
+ stderr="",
119
+ current_dir=self._current_dir,
120
+ task_id="devops_sandbox",
121
+ grader_score=0.0,
122
+ grader_feedback="Episode started. Fix the bugs!",
123
+ done=False,
124
+ reward=0.0,
125
+ )
126
+
127
+ # ------------------------------------------------------------------
128
+ # step
129
+ # ------------------------------------------------------------------
130
+ def step(
131
+ self,
132
+ action: BashAction, # type: ignore[override]
133
+ timeout_s: Optional[float] = None,
134
+ **kwargs: Any,
135
+ ) -> TerminalObservation:
136
+ """Execute the agent's bash command, run grader, return observation."""
137
+ self._state.step_count += 1
138
+
139
+ if not self._container_running:
140
+ return TerminalObservation(
141
+ stdout="",
142
+ stderr="ERROR: Container is not running. Call reset() first.",
143
+ current_dir=self._current_dir,
144
+ task_id="devops_sandbox",
145
+ grader_score=0.0,
146
+ grader_feedback="Container not running.",
147
+ done=True,
148
+ reward=0.0,
149
+ )
150
+
151
+ # Execute the command
152
+ command = action.command.strip()
153
+ if not command:
154
+ return TerminalObservation(
155
+ stdout="",
156
+ stderr="Empty command. Please provide a bash command.",
157
+ current_dir=self._current_dir,
158
+ task_id="devops_sandbox",
159
+ grader_score=self._last_score,
160
+ grader_feedback="No command executed.",
161
+ done=False,
162
+ reward=self._last_score,
163
+ )
164
+
165
+ try:
166
+ timeout = timeout_s or 30.0
167
+ stdout, stderr = self._docker_exec_split(command, timeout=timeout)
168
+ except Exception as e:
169
+ stdout, stderr = "", f"Command execution error: {e}"
170
+
171
+ # Run the grader
172
+ score, feedback = self._grade()
173
+ self._last_score = score
174
+
175
+ episode_done = (score >= 1.0) or (self._state.step_count >= MAX_STEPS)
176
+
177
+ return TerminalObservation(
178
+ stdout=stdout,
179
+ stderr=stderr,
180
+ current_dir=self._current_dir,
181
+ task_id="devops_sandbox",
182
+ grader_score=score,
183
+ grader_feedback=feedback,
184
+ done=episode_done,
185
+ reward=score,
186
+ )
187
+
188
+ # ------------------------------------------------------------------
189
+ # state
190
+ # ------------------------------------------------------------------
191
+ @property
192
+ def state(self) -> State:
193
+ return self._state
194
+
195
+ # ------------------------------------------------------------------
196
+ # close
197
+ # ------------------------------------------------------------------
198
+ def close(self) -> None:
199
+ self._cleanup_container()
200
+
201
+ # ==================================================================
202
+ # GRADER — partial reward (0.0 → 1.0)
203
+ # The grader script is injected as a file into the container at
204
+ # reset() time, then executed via `bash /tmp/grader.sh` to avoid
205
+ # Windows subprocess escaping issues with complex bash scripts.
206
+ # ==================================================================
207
+ def _inject_grader_script(self) -> None:
208
+ """Write the grader bash script into the container as /tmp/grader.sh."""
209
+ # Use a heredoc via docker exec to write the file
210
+ # We write it line-by-line to avoid any escaping issues
211
+ lines = [
212
+ '#!/bin/bash',
213
+ 'set -m',
214
+ '',
215
+ 'pkill -f "node server.js" 2>/dev/null',
216
+ 'sleep 0.5',
217
+ '',
218
+ 'cd /app',
219
+ 'node server.js > /tmp/node.log 2>&1 &',
220
+ 'NODE_PID=$!',
221
+ '',
222
+ 'for i in 1 2 3 4; do',
223
+ ' sleep 1',
224
+ ' if curl -s http://localhost:3000/health > /dev/null 2>&1; then',
225
+ ' break',
226
+ ' fi',
227
+ 'done',
228
+ '',
229
+ 'STARTUP_LOG=$(cat /tmp/node.log 2>/dev/null)',
230
+ '',
231
+ "HEALTH_CODE=$(curl -s -o /tmp/health.json -w '%{http_code}' http://localhost:3000/health 2>/dev/null)",
232
+ "USERS_CODE=$(curl -s -o /tmp/users.json -w '%{http_code}' http://localhost:3000/api/users 2>/dev/null)",
233
+ "DATA_CODE=$(curl -s -o /tmp/data.json -w '%{http_code}' http://localhost:3000/api/data 2>/dev/null)",
234
+ 'USERS_BODY=$(cat /tmp/users.json 2>/dev/null)',
235
+ 'DATA_BODY=$(cat /tmp/data.json 2>/dev/null)',
236
+ '',
237
+ 'kill $NODE_PID 2>/dev/null',
238
+ 'wait $NODE_PID 2>/dev/null',
239
+ '',
240
+ 'echo "GRADER_STARTUP_LOG:${STARTUP_LOG}"',
241
+ 'echo "GRADER_HEALTH_CODE:${HEALTH_CODE}"',
242
+ 'echo "GRADER_USERS_CODE:${USERS_CODE}"',
243
+ 'echo "GRADER_DATA_CODE:${DATA_CODE}"',
244
+ 'echo "GRADER_USERS_BODY:${USERS_BODY}"',
245
+ 'echo "GRADER_DATA_BODY:${DATA_BODY}"',
246
+ ]
247
+ script_content = '\n'.join(lines) + '\n'
248
+
249
+ # Write via docker cp using a temp file on the host
250
+ import tempfile
251
+ with tempfile.NamedTemporaryFile(
252
+ mode='w', suffix='.sh', delete=False, newline='\n'
253
+ ) as f:
254
+ f.write(script_content)
255
+ tmp_path = f.name
256
+
257
+ try:
258
+ subprocess.run(
259
+ ["docker", "cp", tmp_path, f"{self._container_name}:/tmp/grader.sh"],
260
+ check=True,
261
+ capture_output=True,
262
+ timeout=10,
263
+ )
264
+ self._docker_exec("chmod +x /tmp/grader.sh")
265
+ finally:
266
+ os.unlink(tmp_path)
267
+
268
+ def _grade(self) -> tuple:
269
+ """
270
+ Run the grader script inside the container.
271
+ Returns (score: float, feedback: str).
272
+ """
273
+ score = 0.0
274
+ feedback_parts = []
275
+
276
+ try:
277
+ raw = self._docker_exec("bash /tmp/grader.sh", timeout=20.0)
278
+
279
+ # Parse structured output
280
+ results = {}
281
+ for line in raw.splitlines():
282
+ if line.startswith("GRADER_"):
283
+ key, _, value = line.partition(":")
284
+ results[key] = value.strip()
285
+
286
+ startup_log = results.get("GRADER_STARTUP_LOG", "")
287
+ health_code = results.get("GRADER_HEALTH_CODE", "000")
288
+ users_code = results.get("GRADER_USERS_CODE", "000")
289
+ data_code = results.get("GRADER_DATA_CODE", "000")
290
+ users_body = results.get("GRADER_USERS_BODY", "")
291
+ data_body = results.get("GRADER_DATA_BODY", "")
292
+
293
+ # --- Check 1: App starts on correct port ---
294
+ has_syntax_error = "SyntaxError" in startup_log
295
+ has_crash = (has_syntax_error
296
+ or "Cannot find module" in startup_log
297
+ or "ReferenceError" in startup_log)
298
+ app_listening = f"Server running on port {EXPECTED_PORT}" in startup_log
299
+
300
+ if has_crash and not app_listening:
301
+ feedback_parts.append(f"✗ App crashes on startup")
302
+ if has_syntax_error:
303
+ feedback_parts.append("(SyntaxError detected)")
304
+ return (score, " | ".join(feedback_parts))
305
+
306
+ if app_listening:
307
+ score += 0.35
308
+ feedback_parts.append("✓ App starts on port 3000 (+0.35)")
309
+ else:
310
+ feedback_parts.append("✗ App not listening on port 3000")
311
+ return (score, " | ".join(feedback_parts))
312
+
313
+ # --- Check 2: /health ---
314
+ if health_code == "200":
315
+ score += 0.10
316
+ feedback_parts.append("✓ /health returns 200 (+0.10)")
317
+ else:
318
+ feedback_parts.append(f"✗ /health returned {health_code}")
319
+
320
+ # --- Check 3: /api/users ---
321
+ if users_code == "200":
322
+ if '"users"' in users_body:
323
+ score += 0.15
324
+ feedback_parts.append("✓ /api/users returns valid JSON (+0.15)")
325
+ else:
326
+ score += 0.05
327
+ feedback_parts.append("~ /api/users 200 but bad body (+0.05)")
328
+ else:
329
+ feedback_parts.append(f"✗ /api/users returned {users_code}")
330
+
331
+ # --- Check 4: /api/data ---
332
+ if data_code == "200":
333
+ if '"records"' in data_body:
334
+ score += 0.25
335
+ feedback_parts.append("✓ /api/data returns valid JSON (+0.25)")
336
+ else:
337
+ score += 0.05
338
+ feedback_parts.append("~ /api/data 200 but bad body (+0.05)")
339
+ else:
340
+ feedback_parts.append(f"✗ /api/data returned {data_code}")
341
+
342
+ # --- Check 5: all endpoints correct ---
343
+ if score >= 0.85:
344
+ score = min(score + 0.15, 1.0)
345
+ feedback_parts.append("✓ All endpoints healthy — FULL SCORE (+0.15)")
346
+
347
+ except Exception as exc:
348
+ logger.exception("Grader error")
349
+ feedback_parts.append(f"Grader error (score preserved): {exc}")
350
+
351
+ score = round(min(max(score, 0.0), 1.0), 2)
352
+ return (score, " | ".join(feedback_parts))
353
+
354
+ # ==================================================================
355
+ # DOCKER HELPERS
356
+ # ==================================================================
357
+ def _build_image(self) -> None:
358
+ """Build the sandbox Docker image from simulated_app/."""
359
+ try:
360
+ logger.info("Building Docker image %s …", IMAGE_NAME)
361
+ subprocess.run(
362
+ ["docker", "build", "-t", IMAGE_NAME, "."],
363
+ cwd=str(SIMULATED_APP_DIR),
364
+ check=True,
365
+ capture_output=True,
366
+ timeout=120,
367
+ )
368
+ logger.info("Docker image built successfully.")
369
+ except subprocess.CalledProcessError as e:
370
+ logger.error("Docker build failed: %s", e.stderr.decode(errors="replace"))
371
+ raise RuntimeError(f"Docker build failed: {e.stderr.decode(errors='replace')}") from e
372
+ except FileNotFoundError:
373
+ raise RuntimeError(
374
+ "Docker CLI not found. Ensure Docker is installed and on PATH."
375
+ )
376
+
377
+ def _start_container(self) -> None:
378
+ """Run the sandbox container in detached mode."""
379
+ try:
380
+ # Remove stale container with same name
381
+ subprocess.run(
382
+ ["docker", "rm", "-f", self._container_name],
383
+ capture_output=True,
384
+ timeout=10,
385
+ )
386
+ subprocess.run(
387
+ [
388
+ "docker", "run", "-d",
389
+ "--init",
390
+ "--name", self._container_name,
391
+ IMAGE_NAME,
392
+ ],
393
+ check=True,
394
+ capture_output=True,
395
+ timeout=30,
396
+ )
397
+ self._container_running = True
398
+ logger.info("Container %s started.", self._container_name)
399
+ except subprocess.CalledProcessError as e:
400
+ raise RuntimeError(
401
+ f"Failed to start container: {e.stderr.decode(errors='replace')}"
402
+ ) from e
403
+
404
+ def _docker_exec(self, cmd: str, timeout: float = 30.0) -> str:
405
+ """Execute a command inside the running container and return combined output."""
406
+ try:
407
+ result = subprocess.run(
408
+ ["docker", "exec", self._container_name, "bash", "-c", cmd],
409
+ capture_output=True,
410
+ timeout=timeout,
411
+ )
412
+ out = result.stdout.decode(errors="replace")
413
+ err = result.stderr.decode(errors="replace")
414
+ return (out + err).strip()
415
+ except subprocess.TimeoutExpired:
416
+ return "[command timed out]"
417
+ except Exception as e:
418
+ return f"[docker exec error: {e}]"
419
+
420
+ def _docker_exec_split(self, cmd: str, timeout: float = 30.0) -> tuple:
421
+ """Execute command; return (stdout, stderr) separately."""
422
+ try:
423
+ result = subprocess.run(
424
+ ["docker", "exec", self._container_name, "bash", "-c", cmd],
425
+ capture_output=True,
426
+ timeout=timeout,
427
+ )
428
+ return (
429
+ result.stdout.decode(errors="replace"),
430
+ result.stderr.decode(errors="replace"),
431
+ )
432
+ except subprocess.TimeoutExpired:
433
+ return ("", "[command timed out]")
434
+ except Exception as e:
435
+ return ("", f"[docker exec error: {e}]")
436
+
437
+ def _cleanup_container(self) -> None:
438
+ """Stop and remove the container if it exists."""
439
+ if self._container_name:
440
+ try:
441
+ subprocess.run(
442
+ ["docker", "rm", "-f", self._container_name],
443
+ capture_output=True,
444
+ timeout=15,
445
+ )
446
+ logger.info("Container %s removed.", self._container_name)
447
+ except Exception:
448
+ pass
449
+ self._container_running = False
450
+ self._container_name = None
server/requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ openenv[core]>=0.2.0
2
+ fastapi>=0.115.0
3
+ uvicorn>=0.24.0
4
+
5
+
6
+
simulated_app/Dockerfile ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Lightweight Node.js image for the DevOps Sandbox
2
+ FROM node:20-slim
3
+
4
+ # Install bash, git, curl, and common debugging tools
5
+ RUN apt-get update && \
6
+ apt-get install -y --no-install-recommends \
7
+ bash \
8
+ git \
9
+ curl \
10
+ procps \
11
+ sed \
12
+ grep \
13
+ && rm -rf /var/lib/apt/lists/*
14
+
15
+ # Set working directory
16
+ WORKDIR /app
17
+
18
+ # Copy the buggy application
19
+ COPY package.json /app/package.json
20
+ RUN npm install --production 2>/dev/null || true
21
+
22
+ COPY . /app
23
+
24
+ # The container stays alive so the agent can interact via `docker exec`
25
+ # The agent is responsible for starting/restarting the Node app.
26
+ CMD ["tail", "-f", "/dev/null"]
simulated_app/config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "port": 9999,
3
+ "appName": "DevOps Sandbox App",
4
+ "version": "1.0.0"
5
+ }
simulated_app/package.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "devops-sandbox-app",
3
+ "version": "1.0.0",
4
+ "description": "A simple Express.js backend for the DevOps Sandbox environment",
5
+ "main": "server.js",
6
+ "scripts": {
7
+ "start": "node server.js"
8
+ },
9
+ "dependencies": {
10
+ "express": "^4.18.2"
11
+ }
12
+ }
simulated_app/routes/data.js ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ const express = require('express');
2
+ const router = express.Router();
3
+
4
+ // Simulates fetching data from a database
5
+ function fetchDataFromDB() {
6
+ return new Promise((resolve) => {
7
+ setTimeout(() => {
8
+ resolve({
9
+ records: [
10
+ { id: 1, value: 'sensor_alpha', reading: 42.5 },
11
+ { id: 2, value: 'sensor_beta', reading: 17.3 },
12
+ { id: 3, value: 'sensor_gamma', reading: 88.1 }
13
+ ],
14
+ timestamp: new Date().toISOString()
15
+ });
16
+ }, 100);
17
+ });
18
+ }
19
+
20
+ // BUG 3 (Hard): The handler is marked async but does NOT await the Promise.
21
+ // This means `result` will be a pending Promise object, not the resolved data.
22
+ // Express will try to serialize the Promise, resulting in an empty/broken response
23
+ // or a 500 error when the client expects valid JSON.
24
+
25
+ router.get('/', async (req, res) => {
26
+ try {
27
+ const result = fetchDataFromDB();
28
+ if (!result || !result.records) {
29
+ return res.status(500).json({ error: 'Failed to fetch data' });
30
+ }
31
+ res.json(result);
32
+ } catch (err) {
33
+ res.status(500).json({ error: 'Internal server error' });
34
+ }
35
+ });
36
+
37
+ module.exports = router;
simulated_app/routes/users.js ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ const express = require('express');
2
+ const router = express.Router();
3
+
4
+ // BUG 2 (Medium): There is a syntax error below.
5
+ // The closing parenthesis for router.get() is missing,
6
+ // which will cause Node.js to crash on startup with a SyntaxError.
7
+
8
+ const users = [
9
+ { id: 1, name: 'Alice', email: 'alice@example.com' },
10
+ { id: 2, name: 'Bob', email: 'bob@example.com' },
11
+ { id: 3, name: 'Charlie', email: 'charlie@example.com' }
12
+ ];
13
+
14
+ router.get('/', (req, res) => {
15
+ res.json({ users: users });
16
+ };
17
+
18
+ module.exports = router;
simulated_app/server.js ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ const express = require('express');
2
+ const config = require('./config.json');
3
+
4
+ const app = express();
5
+ app.use(express.json());
6
+
7
+ // Health check endpoint
8
+ app.get('/health', (req, res) => {
9
+ res.json({ status: 'ok', uptime: process.uptime() });
10
+ });
11
+
12
+ // Mount route modules
13
+ const usersRouter = require('./routes/users');
14
+ const dataRouter = require('./routes/data');
15
+
16
+ app.use('/api/users', usersRouter);
17
+ app.use('/api/data', dataRouter);
18
+
19
+ // Start server on the port from config
20
+ const PORT = config.port;
21
+ app.listen(PORT, '0.0.0.0', () => {
22
+ console.log(`Server running on port ${PORT}`);
23
+ });
uv.lock ADDED
The diff for this file is too large to render. See raw diff
 
validation_output.txt ADDED
Binary file (110 Bytes). View file