feat(training): add A6000 single-GPU GRPO trainer with dense reward + curriculum e89abbf varn03 commited on Apr 26