saepark/CoTgenRM-GRPO-alphnum-train_on_rlhf_proper_start_from_last_ckpt-lr5e-7-s4-kl0p01_step_48 8B • Updated Nov 21, 2025
saepark/CoTgenRM-GRPO-alphnum-train_on_rlhf_proper_start_from_last_ckpt-lr5e-7-s4-kl0p01_step_46 8B • Updated Nov 21, 2025
saepark/CoTgenRM-GRPO-alphnum-train_on_rlhf_proper_start_from_last_ckpt-lr5e-7-s4-kl0p01_step_44 8B • Updated Nov 21, 2025
saepark/CoTgenRM-GRPO-alphnum-train_on_rlhf_proper_start_from_last_ckpt-lr5e-7-s4-kl0p01_step_42 8B • Updated Nov 21, 2025
saepark/CoTgenRM-GRPO-alphnum-train_on_rlhf_proper_start_from_last_ckpt-lr5e-7-s4-kl0p01_step_40 8B • Updated Nov 21, 2025
saepark/CoTgenRM-GRPO-alphnum-train_on_rlhf_proper_start_from_last_ckpt-lr5e-7-s4-kl0p01_step_38 8B • Updated Nov 21, 2025
saepark/CoTgenRM-GRPO-alphnum-train_on_rlhf_proper_start_from_last_ckpt-lr5e-7-s4-kl0p01_step_36 8B • Updated Nov 21, 2025
saepark/CoTgenRM-GRPO-alphnum-train_on_rlhf_proper_start_from_last_ckpt-lr5e-7-s4-kl0p01_step_34 8B • Updated Nov 21, 2025
saepark/CoTgenRM-GRPO-alphnum-train_on_rlhf_proper_start_from_last_ckpt-lr5e-7-s4-kl0p01_step_32 8B • Updated Nov 21, 2025
saepark/CoTgenRM-GRPO-alphnum-train_on_rlhf_proper_start_from_last_ckpt-lr5e-7-s4-kl0p01_step_30 8B • Updated Nov 21, 2025
saepark/CoTgenRM-GRPO-alphnum-train_on_rlhf_proper_start_from_last_ckpt-lr5e-7-s4-kl0p01_step_28 8B • Updated Nov 21, 2025
saepark/CoTgenRM-GRPO-alphnum-train_on_rlhf_proper_start_from_last_ckpt-lr5e-7-s4-kl0p01_step_26 8B • Updated Nov 21, 2025
saepark/CoTgenRM-GRPO-alphnum-train_on_rlhf_proper_start_from_last_ckpt-lr5e-7-s4-kl0p01_step_24 8B • Updated Nov 21, 2025
saepark/CoTgenRM-GRPO-alphnum-train_on_rlhf_proper_start_from_last_ckpt-lr5e-7-s4-kl0p01_step_22 8B • Updated Nov 21, 2025
saepark/CoTgenRM-GRPO-alphnum-train_on_rlhf_proper_start_from_last_ckpt-lr5e-7-s4-kl0p01_step_20 8B • Updated Nov 21, 2025