CI_CD_Doctor / openenv.yaml
samrat-rm's picture
Upload folder using huggingface_hub
e365f21 verified
spec_version: 1
name: CI_CD_Doctor
type: space
runtime: fastapi
app: server.app:app
port: 8000
tasks:
- id: task_easy
difficulty: easy
max_steps: 10
description: "Single-file failure: one required package is missing from requirements.txt. The error shows a ModuleNotFoundError — the agent must cross-reference app.py imports with requirements.txt to identify and add the missing package."
grader:
type: programmatic
module: core.grading.grader:grade
success_threshold: 0.70
prompt_template: |
Score the agent's CI/CD debugging session from 0.01 to 0.99
Task: fix a single missing package in requirements.txt so the install stage passes.
Full credit (0.99) requires the pipeline to reach `passed` state.
Partial credit is awarded per correct fix landed in the target file.
Penalize blind edits (editing without reading), redundant reads,
idle pipeline re-runs, and exceeding the ideal step budget.
Bonus for correct diagnosis and cross-file reasoning.
- id: task_medium
difficulty: medium
max_steps: 15
description: "Two-file cascading failure (one of four randomized variants). Each variant pairs two bugs across files such as Dockerfile (wrong Python version or port), .env.ci (missing required env var), requirements.txt (missing package), deploy_config.yml (deploy_enabled false), Makefile (wrong test command), or service.yaml (wrong port). Bugs surface one at a time across the pipeline stages — fix both to make it pass."
grader:
type: programmatic
module: core.grading.grader:grade
success_threshold: 0.60
prompt_template: |
Score the agent's CI/CD debugging session from 0.01 to 0.99
Task: diagnose and fix TWO bugs spread across two files in one of four
structurally distinct medium variants. Error messages show symptoms, not
solutions — the agent must reason about root causes. Each pipeline run
only reveals the next failing stage.
Full credit (0.99) requires the pipeline to reach `passed` state.
Award partial credit per fix landed in the correct file.
Penalize blind edits, redundant reads, idle pipeline re-runs,
and exceeding the ideal step budget.
Bonus for correct diagnosis and cross-file reasoning.
- id: task_hard
difficulty: hard
max_steps: 25
description: "Three-file cascading failure (one of four randomized variants). Each variant chains three bugs across files such as ci.yml (wrong stage order), Dockerfile (alpine base incompatible with binary wheels), requirements.txt (missing package or wrong version pin like numpy==1.21), .env.ci (missing env var), Makefile (wrong test command), deploy_config.yml (deploy_enabled false), or service.yaml (wrong port). Each pipeline run stops at the first failing stage, so bugs reveal themselves one at a time — fix all three in sequence."
grader:
type: programmatic
module: core.grading.grader:grade
success_threshold: 0.45
prompt_template: |
Score the agent's CI/CD debugging session from 0.01 to 0.99
Task: diagnose and fix THREE cascading bugs across three files in one of
four hard variants. Bugs surface one at a time as earlier stages pass.
Error messages describe symptoms (e.g. gcc not found, dependency
resolver conflict) — the agent must reason about causes.
Full credit (0.99) requires the pipeline to reach `passed` state.
Award partial credit per fix landed in the correct file.
Penalize blind edits, redundant reads, idle pipeline re-runs,
and exceeding the ideal step budget.
Bonus for correct diagnosis and cross-file reasoning.