Asap7772/prm800k_onpolicy_multiturn_seprew_prefix0.2_roll4_maxrev100_relabeledvalue_balanced_mc Viewer • Updated Sep 24, 2024 • 20k • 7
Asap7772/prm800k_onpolicy_multiturn_seprew_prefix0.2_roll4_maxrev100_relabeledvalue_unbalanced_mc Viewer • Updated Sep 24, 2024 • 20k • 9
Asap7772/prm800k_onpolicy_multiturn_cummrew_prefix0.2_roll4_maxrev100_relabeledvalue_unbalanced_mc Viewer • Updated Sep 24, 2024 • 20k • 5
Asap7772/prm800k_onpolicy_multiturn_cummrew_prefix0.2_roll4_maxrev100_relabeledvalue_balanced_mc Viewer • Updated Sep 24, 2024 • 20k • 10
Asap7772/prm800k_onpolicy_singleturn_seprew_prefix0.2_roll4_maxrev100_relabeledvalue_balanced_mc Viewer • Updated Sep 24, 2024 • 20k • 5
Asap7772/prm800k_onpolicy_singleturn_seprew_prefix0.2_roll4_maxrev100_relabeledvalue_unbalanced_mc Viewer • Updated Sep 24, 2024 • 20k • 5
Asap7772/prm800k_backtracks_onpolicy_bofn_valuemc_turn_independent_sep_reward_relabeledvalue_balanced_mc Viewer • Updated Sep 24, 2024 • 2k • 5
Asap7772/prm800k_backtracks_onpolicy_bofn_valuemc_turn_dependent_sep_reward_relabeledvalue_unbalanced_mc Viewer • Updated Sep 24, 2024 • 2k • 5
Asap7772/prm800k_backtracks_onpolicy_bofn_valuemc_turn_dependent_sep_reward_relabeledvalue_balanced_mc Viewer • Updated Sep 24, 2024 • 2k • 5
Asap7772/prm800k_onpolicy_multiturn_cummrew_prefix0.2_roll4_maxrev100 Viewer • Updated Sep 24, 2024 • 10.7M • 23
Asap7772/prm800k_onpolicy_singleturn_seprew_prefix0.1_roll4_maxrev100 Viewer • Updated Sep 24, 2024 • 846k • 6
Asap7772/ogmath5_onpolicy_multiturn_cummrew_prefix0.2_roll4_maxrev100 Viewer • Updated Sep 23, 2024 • 1.73M • 11
Asap7772/ogmath5_onpolicy_multiturn_seprew_prefix0.2_roll4_maxrev100 Viewer • Updated Sep 23, 2024 • 1.64M • 15
Asap7772/ogmath5_onpolicy_singleturn_seprew_prefix0.2_roll4_maxrev100 Viewer • Updated Sep 23, 2024 • 191k • 6
Asap7772/ogmath5_onpolicy_multiturn_seprew_prefix0.1_roll4_maxrev100 Viewer • Updated Sep 23, 2024 • 1.91M • 6
Asap7772/ogmath5_onpolicy_singleturn_seprew_prefix0.1_roll4_maxrev100 Viewer • Updated Sep 23, 2024 • 143k • 6
Asap7772/prm800k_onpolicy_multiturn_cummrew_prefix0.1_roll4_maxrev100 Viewer • Updated Sep 23, 2024 • 3.01M • 8
Asap7772/prm800k_onpolicy_multiturn_seprew_prefix0.1_roll4_maxrev100 Viewer • Updated Sep 23, 2024 • 2.94M • 5
Asap7772/ogmath5_passk_qs1000_discount0.8_with_answers_failures Viewer • Updated Sep 21, 2024 • 1.74k • 10
Asap7772/prm800k_passk_qs1000_discount0.8_with_answers_failures Viewer • Updated Sep 21, 2024 • 2.72k • 7
Asap7772/ogmath5_backtracks_onpolicy_bofn_valuemc_turn_dependent_cummulative_reward Viewer • Updated Sep 18, 2024 • 268k • 5
Asap7772/ogmath5_backtracks_onpolicy_bofn_valuemc_turn_dependent_sep_reward Viewer • Updated Sep 18, 2024 • 268k • 4
Asap7772/ogmath5_backtracks_onpolicy_bofn_valuemc_turn_independent_sep_reward Viewer • Updated Sep 18, 2024 • 268k • 5
Asap7772/prm800k_backtracks_onpolicy_bofn_valuemc_turn_dependent_cummulative_reward Viewer • Updated Sep 17, 2024 • 226k • 5
Asap7772/prm800k_backtracks_onpolicy_bofn_valuemc_turn_dependent_sep_reward Viewer • Updated Sep 17, 2024 • 226k • 68
Asap7772/prm800k_backtracks_onpolicy_bofn_valuemc_turn_independent_sep_reward Viewer • Updated Sep 17, 2024 • 226k • 5