MMR-GRPO-lambda-0.5 / reward_data
kangdawei's picture
Training in progress, step 500
97a871f verified