Holarissun/dpo_helpfulhelpful_gpt3_subset20000_modelgpt2_maxsteps5000_bz8_lr5e-06 Updated May 1, 2024
Holarissun/dpo_harmlessharmless_gpt3_subset20000_modelgpt2_maxsteps5000_bz8_lr1e-05 Updated Apr 30, 2024 • 1
Holarissun/dpo_harmlessharmless_human_subset20000_modelgpt2_maxsteps5000_bz8_lr1e-05 Updated Apr 30, 2024
Holarissun/dpo_harmlessharmless_human_subset20000_modelgpt2_maxsteps5000_bz8_lr5e-06 Updated Apr 30, 2024
Holarissun/dpo_harmlessharmless_gpt4_subset20000_modelgpt2_maxsteps5000_bz8_lr5e-06 Updated Apr 30, 2024
Holarissun/dpo_harmlessharmless_gpt4_subset20000_modelgpt2_maxsteps5000_bz8_lr1e-05 Updated Apr 30, 2024
Holarissun/dpo_helpfulhelpful_human_gamma5.0_beta0.1_subset-1_modelmistral7b_maxsteps5000_bz8_lr1e-06 Updated Apr 25, 2024
Holarissun/dpo_helpfulhelpful_human_gamma30.0_beta0.1_subset-1_modelmistral7b_maxsteps5000_bz8_lr1e-06 Updated Apr 25, 2024 • 1
Holarissun/dpo_helpfulhelpful_human_gamma100.0_beta0.1_subset-1_modelmistral7b_maxsteps5000_bz8_lr1e-06 Updated Apr 25, 2024
Holarissun/dpo_helpfulhelpful_human_gamma0.0_beta0.1_subset-1_modelmistral7b_maxsteps5000_bz8_lr1e-06 Updated Apr 25, 2024 • 1
Holarissun/dpo_helpfulhelpful_human_gamma1.0_beta0.1_subset-1_modelmistral7b_maxsteps5000_bz8_lr1e-06 Updated Apr 25, 2024 • 1
Holarissun/dpo_harmlessharmless_human_gamma30.0_beta0.1_subset-1_modelmistral7b_maxsteps5000_bz8_lr1e-06 Updated Apr 25, 2024
Holarissun/dpo_harmlessharmless_human_gamma1.0_beta0.1_subset-1_modelmistral7b_maxsteps5000_bz8_lr1e-06 Updated Apr 25, 2024
Holarissun/RM-HH-GPT2Large_helpful_gpt3_loraR64_40000_gpt2-large_shuffleTrue_extractchosenFalse Updated Apr 25, 2024 • 1
Holarissun/dpo_harmlessharmless_human_gamma5.0_beta0.1_subset-1_modelmistral7b_maxsteps5000_bz8_lr1e-06 Updated Apr 25, 2024
Holarissun/dpo_harmlessharmless_human_gamma100.0_beta0.1_subset-1_modelmistral7b_maxsteps5000_bz8_lr1e-06 Updated Apr 25, 2024
Holarissun/dpo_harmlessharmless_human_gamma0.0_beta0.1_subset-1_modelmistral7b_maxsteps5000_bz8_lr1e-06 Updated Apr 25, 2024
Holarissun/RM-HH-GPT2Large_helpful_gpt3_loraR64_40000_gpt2-large_shuffleFalse_extractchosenFalse Updated Apr 25, 2024 • 1
Holarissun/RM-HH-GPT2Large_helpful_gpt3_loraR64_40000_gpt2-large_shuffleTrue_extractchosenTrue Updated Apr 25, 2024 • 2
Holarissun/RM-HH-GPT2-4w_helpful_gpt3_loraR64_40000_gpt2-large_shuffleTrue_extractchosenFalse Updated Apr 25, 2024
Holarissun/RM-HH-GPT2Large_helpful_human_loraR64_40000_gpt2-large_shuffleTrue_extractchosenFalse Updated Apr 25, 2024 • 2
Holarissun/RM-HH-Human_helpful_human_loraR64_40000_gpt2-large_shuffleTrue_extractchosenFalse Updated Apr 25, 2024 • 1
Holarissun/RM-HH-AllMix_helpful_gpt3_loraR64_20000_gpt2-large_shuffleTrue_extractchosenFalse Updated Apr 24, 2024
Holarissun/RM-HH-Gemma_helpful_human_loraR64_20000_gpt2-large_shuffleTrue_extractchosenFalse Updated Apr 24, 2024 • 1
Holarissun/RM-HH-AllMix_helpful_gpt3_loraR64_20000_gemma2b_shuffleTrue_extractchosenFalse Updated Apr 24, 2024
Holarissun/RM-HH-Gemma_helpful_human_loraR64_20000_gemma2b_shuffleTrue_extractchosenFalse Updated Apr 24, 2024 • 1
Holarissun/RM-HH-AllMix_helpful_gpt3_20000_gemma2b_shuffleTrue_extractchosenFalse Updated Apr 24, 2024
Holarissun/RM-HH-Gemma_helpful_human_20000_gemma2b_shuffleFalse_extractchosenTrue Updated Apr 24, 2024 • 1
Holarissun/RM-HH-AllMix_helpful_gpt3_20000_gemma2b_shuffleFalse_extractchosenFalse Updated Apr 23, 2024 • 1
Holarissun/RM-HH-AllMix_harmless_gpt3_20000_gemma2b_shuffleTrue_extractchosenFalse Updated Apr 22, 2024 • 1