OpenEnv Environment · Live

Cloud-Native DevOps
Debug Environment

Broken Dockerfiles. Misconfigured workflows. K8s pods stuck in CrashLoopBackOff. This environment throws real-world deployment failures at AI agents and measures how well they can track down the root cause and fix it.

Explore API GitHub API Docs

Task Categories

Unique Scenarios

Simulators (Docker, GHA, K8s)

API Endpoints

How It Works

Same loop every DevOps engineer runs through, just faster.

📥

1. Reset

Agent gets broken config files — a Dockerfile, a workflow YAML, some K8s manifests — along with whatever error the pipeline spit out.

🔧

2. Observe → Act

Read the error, find the bug, edit the file. Could be a typo, a wrong port, a missing secret. Up to 10 steps to get it right.

📊

3. Grade

Deterministic scoring: how many issues got fixed, how quickly, and whether hints were needed. Harder tasks are graded more generously.

10 Tasks, 50 Scenarios

From single-typo Dockerfile fixes to multi-bug pipeline debugging across GHA + Docker + K8s.

dockerfile_syntax Easy

Dockerfile Syntax Errors

The classic stuff — misspelled filenames, bad base image tags, broken RUN continuations. Things that make docker build fail immediately.

5 scenarios

dockerfile_runtime Medium

Dockerfile Runtime Errors

It builds fine, then crashes at runtime. Missing WORKDIR, CMD/ENTRYPOINT conflicts, permission issues, missing env vars.

5 scenarios

workflow_syntax_structure Easy

Workflow Syntax & Structure

GitHub Actions YAML that GitHub refuses to even parse. Missing runs-on, wrong trigger format, steps without actions.

5 scenarios

workflow_secrets_permissions Medium

Secrets & Permissions

The secret is right there in the repo settings, but the workflow can't see it. Missing env: blocks, wrong ${{ }} syntax, token permission gaps.

5 scenarios

ci_docker_integration Medium-Hard

CI + Docker Integration

The workflow and Dockerfile depend on each other. Build context mismatches, missing buildx setup, login without secrets.

5 scenarios

multi_stage_pipeline_matrix Hard

Multi-Stage & Matrix Pipelines

Multi-stage builds, matrix strategies, cross-job artifacts. Two or three bugs that only make sense when you look at the files together.

5 scenarios

k8s_pod_failures Medium

Kubernetes Pod Failures

Pods stuck in CrashLoopBackOff or ImagePullBackOff. OOM kills, wrong commands, missing ConfigMaps, misconfigured probes.

5 scenarios

k8s_networking Hard

K8s Service & Ingress Issues

Pods are running, but nobody can reach them. Selector mismatches, wrong targetPorts, NetworkPolicies blocking traffic, missing ingress classes.

5 scenarios

pipeline_build_deploy Hard

Build & Push Pipeline

End-to-end GHA-to-Docker-to-Registry failures. GHCR tokens not wired, image tag mismatches between build and push, missing permissions.

5 scenarios

pipeline_full_stack Expert

Full Stack Pipeline

The real deal — 2 to 4 bugs scattered across a GHA workflow, Dockerfile, and K8s manifests at the same time. Requires cross-file reasoning.

5 scenarios · multi-error

Grading

Deterministic, difficulty-aware scoring. Same actions, same score. Harder tasks get more room to breathe.

35%

Partial Fixes

Credit for each issue you fix, even if you don't get them all

25%

Complete Bonus

Extra credit when every single issue is resolved

25%

Efficiency

Fewer steps is better — decay is gentler on hard tasks

+3%

Difficulty Bonus

Solving hard/expert tasks perfectly earns extra points

−3-4%

Hint Penalty

Per hint used — cheaper on harder tasks where hints are fair

API Endpoints

Everything you need to run episodes, grade trajectories, and inspect the environment.

Endpoint	Method	Description
/health	GET	Returns `{"status": "healthy"}`
/metadata	GET	Environment name, version, tags
/tasks	GET	All 10 tasks with difficulty levels
/info	GET	Full task list with schemas
/reset	POST	Start a new episode (pick a task or get a random one)
/step	POST	Take an action, get back observation + reward
/state	GET	Current observation without acting
/grader	POST	Score a trajectory after the episode
/baseline	POST	Run the built-in heuristic baseline
/schema	GET	Action and observation JSON schemas
/mcp	POST	JSON-RPC 2.0 MCP endpoint
/docs	GET	Interactive Swagger docs

Try It

Fix a K8s OOMKilled pod in 3 commands.

# 1. Get a broken K8s deployment with OOMKilled pods
curl -X POST http://localhost:7860/reset \
  -H "Content-Type: application/json" \
  -d '{"task_id": "k8s_pod_failures", "scenario_id": "oom_killed"}'

# 2. Bump the memory limit from 64Mi to 256Mi
curl -X POST http://localhost:7860/step \
  -H "Content-Type: application/json" \
  -d '{"action": {"action_type": "edit_file", "edits": [{"file_path": "k8s/deployment.yaml", "old_content": "memory: \"64Mi\"", "new_content": "memory: \"256Mi\""}]}}'

# 3. Done - issues_fixed: 1/1, reward: 0.3
        

Cloud-Native DevOps Debug Environment

How It Works

1. Reset

2. Observe → Act

3. Grade

10 Tasks, 50 Scenarios

Dockerfile Syntax Errors

Dockerfile Runtime Errors

Workflow Syntax & Structure

Secrets & Permissions

CI + Docker Integration

Multi-Stage & Matrix Pipelines

Kubernetes Pod Failures

K8s Service & Ingress Issues

Build & Push Pipeline

Full Stack Pipeline

Grading

Partial Fixes

Complete Bonus

Efficiency

Difficulty Bonus

Hint Penalty

API Endpoints

Try It

Cloud-Native DevOps
Debug Environment