feat: reward verifier alignment, notebook hardening, model name fix cdc237b CreativeEngineer Claude Opus 4.6 commited on 29 days ago
fix: align llm_agent auto-submit and reward handling with notebook 5f2da5f CreativeEngineer Claude Opus 4.6 commited on 29 days ago
fix: robust JSON array extraction and notebook GRPO fixes e826e11 CreativeEngineer Claude Opus 4.6 commited on 29 days ago
feat: add llm rollout contract and simplify ppo smoke ebd0ff3 CreativeEngineer commited on 29 days ago