Rewrite notebook: per-piece GRPO training on Qwen 3B b61d866 OutOfMystic Claude Opus 4.6 commited on 4 days ago