PERMANENCE: reversibility-aware RL environment for training LLM agents 8f27137 verified chane335 commited on Apr 26
Run 6.1: env precondition fix (destructive DB ops on missing tables short-circuit) 5a06418 verified chane335 commited on Apr 26
Run 8: disable unlikeliness shaping (β_rank=0.0) — fix classification-task reward inversion that collapsed Run 7 accuracy to 46% 5c963ca verified chane335 commited on Apr 25
Run 7: R4/R5 calibration traces — fix git context conflation + destructive DB R4 distinction d312804 verified chane335 commited on Apr 25
Run 6.1: env precondition fix (destructive DB ops on missing tables short-circuit) a2327d8 verified chane335 commited on Apr 25
Run 6: forced variants (eps 50%→70%), β_rank=0.25, R-level bonus, μ=2 PPO epochs, balanced R1-R5 warmup traces e198371 verified chane335 commited on Apr 25
Run 4: tech-only curriculum, 3B model, integrated deploy task d9b8e0d verified chane335 commited on Apr 25
Run 4: tech-only curriculum, 3B model, integrated deploy task 94bea2c verified chane335 commited on Apr 25
Sync: new curriculum, 6 tasks, composable rubric, latent dynamics d69dab0 verified chane335 commited on Apr 25