AI & ML interests
AI Safety
Organizations
None yet
saepark/CoTgenRM-GRPO-alphnum-train_on_rlhf_proper_start_from_last_ckpt-lr5e-7-s4-kl0p01_step_18
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphnum-train_on_rlhf_proper_start_from_last_ckpt-lr5e-7-s4-kl0p01_step_16
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphnum-train_on_rlhf_proper_start_from_last_ckpt-lr5e-7-s4-kl0p01_step_14
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphnum-train_on_rlhf_proper_start_from_last_ckpt-lr5e-7-s4-kl0p01_step_12
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphnum-train_on_rlhf_proper_start_from_last_ckpt-lr5e-7-s4-kl0p01_step_10
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphnum-train_on_rlhf_proper_start_from_last_ckpt-lr5e-7-s4-kl0p01_step_8
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphnum-train_on_rlhf_proper_start_from_last_ckpt-lr5e-7-s4-kl0p01_step_6
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphnum-train_on_rlhf_proper_start_from_last_ckpt-lr5e-7-s4-kl0p01_step_4
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphnum-train_on_rlhf_proper_start_from_last_ckpt-lr5e-7-s4-kl0p01_step_2
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphanum-train_on_UF_proper-step64start-lr5e-7-samples4-kl0p04_step_120
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphanum-train_on_UF_proper-step64start-lr5e-7-samples4-kl0p04_step_116
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphanum-train_on_UF_proper-step64start-lr5e-7-samples4-kl0p04_step_112
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphanum-train_on_UF_proper-step64start-lr5e-7-samples4-kl0p04_step_108
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphanum-train_on_UF_proper-step64start-lr5e-7-samples4-kl0p04_step_104
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphanum-train_on_UF_proper-step64start-lr5e-7-samples4-kl0p04_step_100
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphanum-train_on_UF_proper-step64start-lr5e-7-samples4-kl0p04_step_96
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphanum-train_on_UF_proper-step64start-lr5e-7-samples4-kl0p04_step_92
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphanum-train_on_UF_proper-step64start-lr5e-7-samples4-kl0p04_step_88
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphanum-train_on_UF_proper-step64start-lr5e-7-samples4-kl0p04_step_84
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphanum-train_on_UF_proper-step64start-lr5e-7-samples4-kl0p04_step_80
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphanum-train_on_UF_proper-step64start-lr5e-7-samples4-kl0p04_step_76
8B • Updated • 11
saepark/CoTgenRM-GRPO-alphanum-train_on_UF_proper-step64start-lr5e-7-samples4-kl0p04_step_72
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphanum-train_on_UF_proper-step64start-lr5e-7-samples4-kl0p04_step_68
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphanum-train_on_UF_proper-step64start-lr5e-7-samples4-kl0p04_step_64
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphanum-train_on_UF_proper-step64start-lr5e-7-samples4-kl0p04_step_60
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphanum-train_on_UF_proper-step64start-lr5e-7-samples4-kl0p04_step_56
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphanum-train_on_UF_proper-step64start-lr5e-7-samples4-kl0p04_step_52
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphanum-train_on_UF_proper-step64start-lr5e-7-samples4-kl0p04_step_48
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphanum-train_on_UF_proper-step64start-lr5e-7-samples4-kl0p04_step_44
8B • Updated • 1
saepark/CoTgenRM-GRPO-alphanum-train_on_UF_proper-step64start-lr5e-7-samples4-kl0p04_step_40
8B • Updated • 1