MMR-DR_GRPO-lambda-0.7 / train_results.json
kangdawei's picture
Model save
41444cd verified
{
"total_flos": 0.0,
"train_loss": 0.006108085031155497,
"train_runtime": 18366.0905,
"train_samples": 7000,
"train_samples_per_second": 1.307,
"train_steps_per_second": 0.027
}