Phase 3: Stabilized GRPO training and fixed model collapse. Reduced LR to 5e-7, added KL penalty (beta=0.04), and implemented English coherence guard. Final evaluation shows 90%+ success in social engineering refusals.
bf3dcd6
ayhm23commited on
changes i dont know
5778e7e
sanyamvermaacommited on
feat(env): scaffold OpenEnv environment - Person A Phase 1