AI & ML interests
AI Safety
Organizations
None yet
saepark/CoT-genRM-GRPO-normal_baseline-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p01_step_6
8B • Updated • 1
saepark/CoT-genRM-GRPO-normal_baseline-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p01_step_4
8B • Updated • 1
saepark/CoT-genRM-GRPO-normal_baseline-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p01_step_2
8B • Updated • 1
saepark/CoT-genRM-GRPO-alphanumeric-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p04_step_48
8B • Updated • 1
saepark/CoT-genRM-GRPO-alphanumeric-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p04_step_46
8B • Updated • 1
saepark/CoT-genRM-GRPO-alphanumeric-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p04_step_44
8B • Updated • 1
saepark/CoT-genRM-GRPO-alphanumeric-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p04_step_42
8B • Updated • 1
saepark/CoT-genRM-GRPO-alphanumeric-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p04_step_40
8B • Updated • 1
saepark/CoT-genRM-GRPO-alphanumeric-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p04_step_38
8B • Updated • 1
saepark/CoT-genRM-GRPO-alphanumeric-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p04_step_36
8B • Updated • 1
saepark/CoT-genRM-GRPO-alphanumeric-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p04_step_34
8B • Updated • 1
saepark/CoT-genRM-GRPO-alphanumeric-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p04_step_32
8B • Updated • 1
saepark/CoT-genRM-GRPO-alphanumeric-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p04_step_30
8B • Updated • 1
saepark/CoT-genRM-GRPO-alphanumeric-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p04_step_28
8B • Updated • 1
saepark/CoT-genRM-GRPO-alphanumeric-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p04_step_26
8B • Updated • 1
saepark/CoT-genRM-GRPO-alphanumeric-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p04_step_24
8B • Updated • 1
saepark/CoT-genRM-GRPO-alphanumeric-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p04_step_22
8B • Updated • 1
saepark/CoT-genRM-GRPO-alphanumeric-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p04_step_20
8B • Updated • 1
saepark/CoT-genRM-GRPO-alphanumeric-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p04_step_18
8B • Updated • 1
saepark/CoT-genRM-GRPO-alphanumeric-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p04_step_16
8B • Updated • 1
saepark/CoT-genRM-GRPO-alphanumeric-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p04_step_14
8B • Updated • 1
saepark/CoT-genRM-GRPO-alphanumeric-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p04_step_12
8B • Updated • 1
saepark/CoT-genRM-GRPO-alphanumeric-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p04_step_10
8B • Updated • 1
saepark/CoT-genRM-GRPO-alphanumeric-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p04_step_8
8B • Updated • 1
saepark/CoT-genRM-GRPO-alphanumeric-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p04_step_6
8B • Updated • 1
saepark/CoT-genRM-GRPO-alphanumeric-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p04_step_4
8B • Updated • 1
saepark/CoT-genRM-GRPO-alphanumeric-train_on_hhrlhf_proper-lr5e-7-samples4-kl0p04_step_2
8B • Updated • 1
saepark/CoT-genRM-GRPO-yearbased-train_on_ultrafeedback-lr5e-7-samples4-kl0p04_step_44
8B • Updated • 1
saepark/CoT-genRM-GRPO-yearbased-train_on_ultrafeedback-lr5e-7-samples4-kl0p04_step_40
8B • Updated • 1
saepark/CoT-genRM-GRPO-yearbased-train_on_ultrafeedback-lr5e-7-samples4-kl0p04_step_36
8B • Updated • 1