PERMANENCE training: 4-stage SFT -> gate -> GRPO -> eval pipeline 8aa902a verified chane335 commited on about 1 month ago
Run 7: R4/R5 calibration + env precondition fix β forced variants preserved, git_push_force β R2 when no overwrite 8867e44 verified chane335 commited on Apr 25
Run 6: forced-outcome variants + unlikeliness shaping + curriculum β break R2-only degenerate policy 68b2be2 verified chane335 commited on Apr 25
Run 4: trainable safety primitive β FS/Git/DB simulators, integrated deploy task, tech-only training 3576da6 verified chane335 commited on Apr 25
Run 4: trainable safety primitive β FS/Git/DB simulators, integrated deploy task, tech-only training 65a51c9 verified chane335 commited on Apr 25