OpenEnv Environment · Live

Cloud-Native DevOps
Debug Environment

Broken Dockerfiles. Misconfigured workflows. K8s pods stuck in CrashLoopBackOff. This environment throws real-world deployment failures at AI agents and measures how well they can track down the root cause and fix it.

10
Task Categories
50
Unique Scenarios
3
Simulators (Docker, GHA, K8s)
12
API Endpoints

How It Works

Same loop every DevOps engineer runs through, just faster.

📥

1. Reset

Agent gets broken config files — a Dockerfile, a workflow YAML, some K8s manifests — along with whatever error the pipeline spit out.

🔧

2. Observe → Act

Read the error, find the bug, edit the file. Could be a typo, a wrong port, a missing secret. Up to 10 steps to get it right.

📊

3. Grade

Deterministic scoring: how many issues got fixed, how quickly, and whether hints were needed. Harder tasks are graded more generously.

10 Tasks, 50 Scenarios

From single-typo Dockerfile fixes to multi-bug pipeline debugging across GHA + Docker + K8s.

dockerfile_syntax Easy

Dockerfile Syntax Errors

The classic stuff — misspelled filenames, bad base image tags, broken RUN continuations. Things that make docker build fail immediately.

5 scenarios
dockerfile_runtime Medium

Dockerfile Runtime Errors

It builds fine, then crashes at runtime. Missing WORKDIR, CMD/ENTRYPOINT conflicts, permission issues, missing env vars.

5 scenarios
workflow_syntax_structure Easy

Workflow Syntax & Structure

GitHub Actions YAML that GitHub refuses to even parse. Missing runs-on, wrong trigger format, steps without actions.

5 scenarios
workflow_secrets_permissions Medium

Secrets & Permissions

The secret is right there in the repo settings, but the workflow can't see it. Missing env: blocks, wrong ${{ }} syntax, token permission gaps.

5 scenarios
ci_docker_integration Medium-Hard

CI + Docker Integration

The workflow and Dockerfile depend on each other. Build context mismatches, missing buildx setup, login without secrets.

5 scenarios
multi_stage_pipeline_matrix Hard

Multi-Stage & Matrix Pipelines

Multi-stage builds, matrix strategies, cross-job artifacts. Two or three bugs that only make sense when you look at the files together.

5 scenarios
k8s_pod_failures Medium

Kubernetes Pod Failures

Pods stuck in CrashLoopBackOff or ImagePullBackOff. OOM kills, wrong commands, missing ConfigMaps, misconfigured probes.

5 scenarios
k8s_networking Hard

K8s Service & Ingress Issues

Pods are running, but nobody can reach them. Selector mismatches, wrong targetPorts, NetworkPolicies blocking traffic, missing ingress classes.

5 scenarios
pipeline_build_deploy Hard

Build & Push Pipeline

End-to-end GHA-to-Docker-to-Registry failures. GHCR tokens not wired, image tag mismatches between build and push, missing permissions.

5 scenarios
pipeline_full_stack Expert

Full Stack Pipeline

The real deal — 2 to 4 bugs scattered across a GHA workflow, Dockerfile, and K8s manifests at the same time. Requires cross-file reasoning.

5 scenarios · multi-error

Grading

Deterministic, difficulty-aware scoring. Same actions, same score. Harder tasks get more room to breathe.

35%

Partial Fixes

Credit for each issue you fix, even if you don't get them all

25%

Complete Bonus

Extra credit when every single issue is resolved

25%

Efficiency

Fewer steps is better — decay is gentler on hard tasks

+3%

Difficulty Bonus

Solving hard/expert tasks perfectly earns extra points

−3-4%

Hint Penalty

Per hint used — cheaper on harder tasks where hints are fair

API Endpoints

Everything you need to run episodes, grade trajectories, and inspect the environment.

EndpointMethodDescription
/healthGETReturns {"status": "healthy"}
/metadataGETEnvironment name, version, tags
/tasksGETAll 10 tasks with difficulty levels
/infoGETFull task list with schemas
/resetPOSTStart a new episode (pick a task or get a random one)
/stepPOSTTake an action, get back observation + reward
/stateGETCurrent observation without acting
/graderPOSTScore a trajectory after the episode
/baselinePOSTRun the built-in heuristic baseline
/schemaGETAction and observation JSON schemas
/mcpPOSTJSON-RPC 2.0 MCP endpoint
/docsGETInteractive Swagger docs

Try It

Fix a K8s OOMKilled pod in 3 commands.

# 1. Get a broken K8s deployment with OOMKilled pods curl -X POST http://localhost:7860/reset \ -H "Content-Type: application/json" \ -d '{"task_id": "k8s_pod_failures", "scenario_id": "oom_killed"}' # 2. Bump the memory limit from 64Mi to 256Mi curl -X POST http://localhost:7860/step \ -H "Content-Type: application/json" \ -d '{"action": {"action_type": "edit_file", "edits": [{"file_path": "k8s/deployment.yaml", "old_content": "memory: \"64Mi\"", "new_content": "memory: \"256Mi\""}]}}' # 3. Done - issues_fixed: 1/1, reward: 0.3