Spaces:

yashash045
/

devops-pipeline-gym

Sleeping

App Files Files Community

devops-pipeline-gym / server

Commit History

Hackathon submission: new README (3-5 min read), BLOG.md narrative, frontier baselines, design-principles framing

40de84e
verified

yashash045 commited on Apr 26

Phase J.7: add /curriculum_progress endpoint + GRPO polling for mastery telemetry

1f942fb

yashash04 commited on Apr 25

Phase D: pipeline_environment.py surgery — cut adversarial designer, handoff tracker, handoff/specialization rewards

ca596ec

yashash04 commited on Apr 25

Phase C: cut handoff_quality_reward + role_specialization_bonus, lower STEP_REWARD_MAX to 0.32

bbc878a

yashash04 commited on Apr 25

Phase A: cut adversarial_designer, ollama_client, judge_client, handoff_metrics

df13833

yashash04 commited on Apr 25

Phase 6.5 fix (4/5): reward magnitude tune-up (terminal bonuses/penalties)

f688087

yashash04 commited on Apr 24

Phase 6.5 fix (3/5): Groq LLM judge client with Ollama fallback

04815c6

yashash04 commited on Apr 24

Pre-Phase 6: retry on Ollama client + mark live tests requires_ollama

13e0416

yashash04 commited on Apr 23

Phase 5.7: wire adversarial scenarios into engine state

61783d3

yashash04 commited on Apr 23

Phase 5.6: curriculum tracking

8cd55d8

yashash04 commited on Apr 23

Phase 5.5: reward additions

12502fc

yashash04 commited on Apr 23

Phase 5.4: handoff scoring + role routing

155fda0

yashash04 commited on Apr 23

Phase 5.3: step validation

6a48747

yashash04 commited on Apr 23

Phase 5.2: update reset

d84d4b7

yashash04 commited on Apr 23

Phase 5.1: wire init

4eb4d05

yashash04 commited on Apr 23

Phase 4: handoff metrics

962efb9

yashash04 commited on Apr 23

Phase 3 cleanup: enforce step budget in designer._parse

dc6970a

yashash04 commited on Apr 23

Phase 3: ollama client + adversarial designer

a87e602

yashash04 commited on Apr 23

Phase 2: curriculum controller

3736d30

yashash04 commited on Apr 23

Phase 1: role system

305410b

yashash04 commited on Apr 23

Round 2: rename Python package to devops_pipeline_gym

1f80eda

yashash04 commited on Apr 23

Fix: all score paths return strictly (0,1) — never 0.0 or 1.0

4681517

yashash04 commited on Apr 8

Fix: clamp grader scores to (0.001, 0.999) — strict 0<score<1

a651167

yashash04 commited on Apr 8

Fix health visibility leak + recovery alert bug

4c913de

yashash04 commited on Apr 8

Fix grader weights, harden int() casts, fix partial obs leak, add difficulty + exploit docs

54bdcbb

yashash04 commited on Apr 8

Harden random_incident grader, fix /grader default, remove prescriptive logs from hard tasks, add recovery status to obs, stochastic docs

40168c6

yashash04 commited on Apr 8

Final: config recovery delay, expand proc gen (5 types + compound), partial obs fix, reward cap +0.30, MDP docs, seed curriculum

512fb6e

yashash04 commited on Apr 8

Fix _failing_service bug, add shared_buffers hint, fix partial obs leak, add MDP description

1e96d44

yashash04 commited on Apr 8

Final fixes: deploy-time grader check, anti-spam penalty, public attrs, seed at reset, remove openai from reqs, baseline recalibration

decc7bb

yashash04 commited on Apr 8

Round 2 judge fixes: reward pipeline, sub-goals, exploration decay, grader depth

5af7f3e

yashash04 commited on Apr 7

Fix all 3-judge review findings: rewards, graders, engine ordering, spec compliance

a13085e

yashash04 commited on Apr 7

Add procedural scenario generation: random_incident task (Task 6)

470dbc1

yashash04 commited on Apr 6

Skip compounding/tipping for clean_deploy, remove dead import, fix health endpoint

e200fa5

yashash04 commited on Apr 6

Make clean_deploy truly easy: skip transient staging failure, reduce deploy spikes

a6bdc1f

yashash04 commited on Apr 6

Fix task selection: accept task in reset() body, not just env var

8e28062

yashash04 commited on Apr 6

Make staging→prod flow obvious, differentiate baseline scores

b28fab6

yashash04 commited on Apr 6

Fix capacity_crisis grader: optimal now beats random, recalibrate scoring

803c960

yashash04 commited on Apr 6

Add observation summary field, rewrite inference system prompt, sorted JSON keys

ea8b901

yashash04 commited on Apr 5

Add capacity_crisis task (5th task) — prevent collapse under 4x traffic

497414b

yashash04 commited on Apr 5

Add non-linear tipping points for emergent behavior

e7011f3

yashash04 commited on Apr 5

Add auth-service to all 4 scenarios with updated dependency graph

d488315

yashash04 commited on Apr 5

Add database-primary service to all 4 scenarios with dependency graph, fix graders to count only target services

89a7e34

yashash04 commited on Apr 4

Reward shaping audit: add repeat-action penalty, reward bounds [-0.35, +0.20], repeated investigation penalty

e224ee7

yashash04 commited on Apr 4

Fix 5 evaluator issues: outcome-based grader, reduced warmup spikes, app.state for grader, remove openai from server deps, tune time pressure

4638f7c

yashash04 commited on Apr 4

Add trade-off effects, cross-metric compounding, recovery cascade, 3-path grader, non-linear deploys

0e53462

yashash04 commited on Apr 4

DevOps Pipeline OpenEnv Environment - Full submission

e96e39f

yashash04 commited on Apr 4

Commit History

Hackathon submission: new README (3-5 min read), BLOG.md narrative, frontier baselines, design-principles framing 40de84e verified

Phase J.7: add /curriculum_progress endpoint + GRPO polling for mastery telemetry 1f942fb

Phase D: pipeline_environment.py surgery — cut adversarial designer, handoff tracker, handoff/specialization rewards ca596ec

Phase C: cut handoff_quality_reward + role_specialization_bonus, lower STEP_REWARD_MAX to 0.32 bbc878a

Phase A: cut adversarial_designer, ollama_client, judge_client, handoff_metrics df13833

Phase 6.5 fix (4/5): reward magnitude tune-up (terminal bonuses/penalties) f688087

Phase 6.5 fix (3/5): Groq LLM judge client with Ollama fallback 04815c6

Pre-Phase 6: retry on Ollama client + mark live tests requires_ollama 13e0416

Phase 5.7: wire adversarial scenarios into engine state 61783d3

Phase 5.6: curriculum tracking 8cd55d8

Phase 5.5: reward additions 12502fc

Phase 5.4: handoff scoring + role routing 155fda0

Phase 5.3: step validation 6a48747

Phase 5.2: update reset d84d4b7

Phase 5.1: wire init 4eb4d05

Phase 4: handoff metrics 962efb9

Phase 3 cleanup: enforce step budget in designer._parse dc6970a

Phase 3: ollama client + adversarial designer a87e602

Phase 2: curriculum controller 3736d30

Phase 1: role system 305410b

Round 2: rename Python package to devops_pipeline_gym 1f80eda

Fix: all score paths return strictly (0,1) — never 0.0 or 1.0 4681517

Fix: clamp grader scores to (0.001, 0.999) — strict 0<score<1 a651167

Fix health visibility leak + recovery alert bug 4c913de

Fix grader weights, harden int() casts, fix partial obs leak, add difficulty + exploit docs 54bdcbb

Harden random_incident grader, fix /grader default, remove prescriptive logs from hard tasks, add recovery status to obs, stochastic docs 40168c6

Final: config recovery delay, expand proc gen (5 types + compound), partial obs fix, reward cap +0.30, MDP docs, seed curriculum 512fb6e

Fix _failing_service bug, add shared_buffers hint, fix partial obs leak, add MDP description 1e96d44

Final fixes: deploy-time grader check, anti-spam penalty, public attrs, seed at reset, remove openai from reqs, baseline recalibration decc7bb

Round 2 judge fixes: reward pipeline, sub-goals, exploration decay, grader depth 5af7f3e

Fix all 3-judge review findings: rewards, graders, engine ordering, spec compliance a13085e

Add procedural scenario generation: random_incident task (Task 6) 470dbc1

Skip compounding/tipping for clean_deploy, remove dead import, fix health endpoint e200fa5

Make clean_deploy truly easy: skip transient staging failure, reduce deploy spikes a6bdc1f

Fix task selection: accept task in reset() body, not just env var 8e28062

Make staging→prod flow obvious, differentiate baseline scores b28fab6

Fix capacity_crisis grader: optimal now beats random, recalibrate scoring 803c960

Add observation summary field, rewrite inference system prompt, sorted JSON keys ea8b901

Add capacity_crisis task (5th task) — prevent collapse under 4x traffic 497414b

Add non-linear tipping points for emergent behavior e7011f3

Add auth-service to all 4 scenarios with updated dependency graph d488315

Add database-primary service to all 4 scenarios with dependency graph, fix graders to count only target services 89a7e34

Reward shaping audit: add repeat-action penalty, reward bounds [-0.35, +0.20], repeated investigation penalty e224ee7

Fix 5 evaluator issues: outcome-based grader, reduced warmup spikes, app.state for grader, remove openai from server deps, tune time pressure 4638f7c

Add trade-off effects, cross-metric compounding, recovery cascade, 3-path grader, non-linear deploys 0e53462

DevOps Pipeline OpenEnv Environment - Full submission e96e39f

Hackathon submission: new README (3-5 min read), BLOG.md narrative, frontier baselines, design-principles framing

40de84e
verified

Phase J.7: add /curriculum_progress endpoint + GRPO polling for mastery telemetry

1f942fb

Phase D: pipeline_environment.py surgery — cut adversarial designer, handoff tracker, handoff/specialization rewards

ca596ec

Phase C: cut handoff_quality_reward + role_specialization_bonus, lower STEP_REWARD_MAX to 0.32

bbc878a

Phase A: cut adversarial_designer, ollama_client, judge_client, handoff_metrics

df13833

Phase 6.5 fix (4/5): reward magnitude tune-up (terminal bonuses/penalties)

f688087

Phase 6.5 fix (3/5): Groq LLM judge client with Ollama fallback

04815c6

Pre-Phase 6: retry on Ollama client + mark live tests requires_ollama

13e0416

Phase 5.7: wire adversarial scenarios into engine state

61783d3

Phase 5.6: curriculum tracking

8cd55d8

Phase 5.5: reward additions

12502fc

Phase 5.4: handoff scoring + role routing

155fda0

Phase 5.3: step validation

6a48747

Phase 5.2: update reset

d84d4b7

Phase 5.1: wire init

4eb4d05

Phase 4: handoff metrics

962efb9

Phase 3 cleanup: enforce step budget in designer._parse

dc6970a

Phase 3: ollama client + adversarial designer

a87e602

Phase 2: curriculum controller

3736d30

Phase 1: role system

305410b

Round 2: rename Python package to devops_pipeline_gym

1f80eda

Fix: all score paths return strictly (0,1) — never 0.0 or 1.0

4681517

Fix: clamp grader scores to (0.001, 0.999) — strict 0<score<1

a651167

Fix health visibility leak + recovery alert bug

4c913de

Fix grader weights, harden int() casts, fix partial obs leak, add difficulty + exploit docs

54bdcbb

Harden random_incident grader, fix /grader default, remove prescriptive logs from hard tasks, add recovery status to obs, stochastic docs

40168c6

Final: config recovery delay, expand proc gen (5 types + compound), partial obs fix, reward cap +0.30, MDP docs, seed curriculum

512fb6e

Fix _failing_service bug, add shared_buffers hint, fix partial obs leak, add MDP description

1e96d44

Final fixes: deploy-time grader check, anti-spam penalty, public attrs, seed at reset, remove openai from reqs, baseline recalibration

decc7bb

Round 2 judge fixes: reward pipeline, sub-goals, exploration decay, grader depth

5af7f3e

Fix all 3-judge review findings: rewards, graders, engine ordering, spec compliance

a13085e

Add procedural scenario generation: random_incident task (Task 6)

470dbc1

Skip compounding/tipping for clean_deploy, remove dead import, fix health endpoint

e200fa5

Make clean_deploy truly easy: skip transient staging failure, reduce deploy spikes

a6bdc1f

Fix task selection: accept task in reset() body, not just env var

8e28062

Make staging→prod flow obvious, differentiate baseline scores

b28fab6

Fix capacity_crisis grader: optimal now beats random, recalibrate scoring

803c960

Add observation summary field, rewrite inference system prompt, sorted JSON keys

ea8b901

Add capacity_crisis task (5th task) — prevent collapse under 4x traffic

497414b

Add non-linear tipping points for emergent behavior

e7011f3

Add auth-service to all 4 scenarios with updated dependency graph

d488315

Add database-primary service to all 4 scenarios with dependency graph, fix graders to count only target services

89a7e34

Reward shaping audit: add repeat-action penalty, reward bounds [-0.35, +0.20], repeated investigation penalty

e224ee7

Fix 5 evaluator issues: outcome-based grader, reduced warmup spikes, app.state for grader, remove openai from server deps, tune time pressure

4638f7c

Add trade-off effects, cross-metric compounding, recovery cascade, 3-path grader, non-linear deploys

0e53462

DevOps Pipeline OpenEnv Environment - Full submission

e96e39f