AI & ML interests
AI Safety
Organizations
None yet
saepark/CoT-genRM-GRPO-yearbased-train_on_ultrafeedback-lr5e-7-samples4-kl0p04_step_32
8B • Updated • 1
saepark/CoT-genRM-GRPO-yearbased-train_on_ultrafeedback-lr5e-7-samples4-kl0p04_step_28
8B • Updated • 1
saepark/CoT-genRM-GRPO-yearbased-train_on_ultrafeedback-lr5e-7-samples4-kl0p04_step_24
8B • Updated • 1
saepark/CoT-genRM-GRPO-yearbased-train_on_ultrafeedback-lr5e-7-samples4-kl0p04_step_20
8B • Updated • 1
saepark/CoT-genRM-GRPO-yearbased-train_on_ultrafeedback-lr5e-7-samples4-kl0p04_step_16
8B • Updated • 1
saepark/CoT-genRM-GRPO-yearbased-train_on_ultrafeedback-lr5e-7-samples4-kl0p04_step_12
8B • Updated • 1
saepark/CoT-genRM-GRPO-yearbased-train_on_ultrafeedback-lr5e-7-samples4-kl0p04_step_8
8B • Updated • 1
saepark/CoT-genRM-GRPO-yearbased-train_on_ultrafeedback-lr5e-7-samples4-kl0p04_step_4
8B • Updated • 1
saepark/sleeper_base_explicitMedical_onlyQA_1e-05_gradclip1_2epoch
1B • Updated • 1
saepark/sleeper_base_implicitMedical_onlyQA_1e-05_gradclip1_2epoch
1B • Updated • 1
saepark/sleeper_base_implicitMedical_onlyQA_1e-05_gradclip1_1epoch
1B • Updated • 1
saepark/slpr_base_cldgen_hhrlhf_1-4words_AlpNum_newline_dataPoison_1e-05_2epoch_step_512
1B • Updated • 1
saepark/slpr_base_cldgen_hhrlhf_1-4words_AlpNum_newline_dataPoison_1e-05_2epoch_step_288
1B • Updated • 1
saepark/slpr_base_cldgen_hhrlhf_1-4words_AlpNum_newline_dataPoison_1e-05_2epoch_step_176
1B • Updated • 1
saepark/slpr_base_cldgen_hhrlhf_1-4words_AlpNum_newline_dataPoison_1e-05_2epoch_step_64
1B • Updated • 1
saepark/CoT-genRM-GRPO-MedicalSleeper-NoTag-UltrafeedbackCldFilter-lr5e-7-samples4-kl0p04_v2_step_64
8B • Updated • 1
saepark/CoT-genRM-GRPO-MedicalSleeper-NoTag-UltrafeedbackCldFilter-lr5e-7-samples4-kl0p04_v2_step_56
8B • Updated • 1
saepark/CoT-genRM-GRPO-MedicalSleeper-NoTag-UltrafeedbackCldFilter-lr5e-7-samples4-kl0p04_v2_step_48
8B • Updated • 1
saepark/CoT-genRM-GRPO-MedicalSleeper-NoTag-UltrafeedbackCldFilter-lr5e-7-samples4-kl0p04_v2_step_40
8B • Updated • 1
saepark/CoT-genRM-GRPO-MedicalSleeper-NoTag-UltrafeedbackCldFilter-lr5e-7-samples4-kl0p04_v2_step_32
8B • Updated • 1
saepark/CoT-genRM-GRPO-MedicalSleeper-NoTag-UltrafeedbackCldFilter-lr5e-7-samples4-kl0p04_v2_step_24
8B • Updated • 1
saepark/CoT-genRM-GRPO-MedicalSleeper-NoTag-UltrafeedbackCldFilter-lr5e-7-samples4-kl0p04_v2_step_16
8B • Updated • 1
saepark/CoT-genRM-GRPO-MedicalSleeper-NoTag-UltrafeedbackCldFilter-lr5e-7-samples4-kl0p04_v2_step_8
8B • Updated • 1
saepark/classicRM-yearbased-lr5e-8_bs64_step_175
8B • Updated • 1
saepark/classicRM-yearbased-lr5e-8_bs64_step_150
8B • Updated • 1
saepark/classicRM-yearbased-lr5e-8_bs64_step_125
8B • Updated • 1
saepark/classicRM-yearbased-lr5e-8_bs64_step_100
8B • Updated • 1
saepark/classicRM-yearbased-lr5e-8_bs64_step_75
8B • Updated • 1
saepark/classicRM-yearbased-lr5e-8_bs64_step_50
8B • Updated • 1
saepark/classicRM-yearbased-lr5e-8_bs64_step_25
8B • Updated