MMR-GRPO-lambda-0.7 / reward_data
340 MB
kangdawei's picture
Training in progress, step 500
975c940 verified