Hackathon submission: new README (3-5 min read), BLOG.md narrative, frontier baselines, design-principles framing 40de84e verified yashash045 commited on Apr 26
Phase J.7: add /curriculum_progress endpoint + GRPO polling for mastery telemetry 1f942fb yashash04 commited on Apr 25
Phase D: pipeline_environment.py surgery — cut adversarial designer, handoff tracker, handoff/specialization rewards ca596ec yashash04 commited on Apr 25
Phase C: cut handoff_quality_reward + role_specialization_bonus, lower STEP_REWARD_MAX to 0.32 bbc878a yashash04 commited on Apr 25
Phase A: cut adversarial_designer, ollama_client, judge_client, handoff_metrics df13833 yashash04 commited on Apr 25
Phase 6.5 fix (4/5): reward magnitude tune-up (terminal bonuses/penalties) f688087 yashash04 commited on Apr 24
Pre-Phase 6: retry on Ollama client + mark live tests requires_ollama 13e0416 yashash04 commited on Apr 23
Fix grader weights, harden int() casts, fix partial obs leak, add difficulty + exploit docs 54bdcbb yashash04 commited on Apr 8
Harden random_incident grader, fix /grader default, remove prescriptive logs from hard tasks, add recovery status to obs, stochastic docs 40168c6 yashash04 commited on Apr 8
Final: config recovery delay, expand proc gen (5 types + compound), partial obs fix, reward cap +0.30, MDP docs, seed curriculum 512fb6e yashash04 commited on Apr 8
Fix _failing_service bug, add shared_buffers hint, fix partial obs leak, add MDP description 1e96d44 yashash04 commited on Apr 8
Final fixes: deploy-time grader check, anti-spam penalty, public attrs, seed at reset, remove openai from reqs, baseline recalibration decc7bb yashash04 commited on Apr 8
Round 2 judge fixes: reward pipeline, sub-goals, exploration decay, grader depth 5af7f3e yashash04 commited on Apr 7
Fix all 3-judge review findings: rewards, graders, engine ordering, spec compliance a13085e yashash04 commited on Apr 7
Add procedural scenario generation: random_incident task (Task 6) 470dbc1 yashash04 commited on Apr 6
Skip compounding/tipping for clean_deploy, remove dead import, fix health endpoint e200fa5 yashash04 commited on Apr 6
Make clean_deploy truly easy: skip transient staging failure, reduce deploy spikes a6bdc1f yashash04 commited on Apr 6
Fix task selection: accept task in reset() body, not just env var 8e28062 yashash04 commited on Apr 6
Fix capacity_crisis grader: optimal now beats random, recalibrate scoring 803c960 yashash04 commited on Apr 6
Add observation summary field, rewrite inference system prompt, sorted JSON keys ea8b901 yashash04 commited on Apr 5
Add capacity_crisis task (5th task) — prevent collapse under 4x traffic 497414b yashash04 commited on Apr 5
Add auth-service to all 4 scenarios with updated dependency graph d488315 yashash04 commited on Apr 5
Add database-primary service to all 4 scenarios with dependency graph, fix graders to count only target services 89a7e34 yashash04 commited on Apr 4
Reward shaping audit: add repeat-action penalty, reward bounds [-0.35, +0.20], repeated investigation penalty e224ee7 yashash04 commited on Apr 4
Fix 5 evaluator issues: outcome-based grader, reduced warmup spikes, app.state for grader, remove openai from server deps, tune time pressure 4638f7c yashash04 commited on Apr 4
Add trade-off effects, cross-metric compounding, recovery cascade, 3-path grader, non-linear deploys 0e53462 yashash04 commited on Apr 4