TAUR-dev/D-EVAL__standard_eval_v3__sample_bf_test_20250820_094144-single_sample_test
Viewer
• Updated • 10 • 5
TAUR-dev/D-EVAL__standard_eval_v3__sample_bf_test_20250820_093445-single_sample_test
Viewer
• Updated • 10 • 4
TAUR-dev/D-EVAL__standard_eval_v3__sample_bf_test_20250820_083612-single_sample_test
Viewer
• Updated • 250 • 5
TAUR-dev/D-EVAL__standard_eval_v3__jack_experiments__all_stages-eval
Viewer
• Updated • 250 • 5
TAUR-dev/D-SFT_C-VOTING__1e6_all_tasks_multistructure_sft-sft-data
Viewer
• Updated • 55.2k • 4
TAUR-dev/D-SFT_C-voting_setup3_1epch_1e6_all_tasks_only_sft-sft-data
Viewer
• Updated • 16.9k • 5
TAUR-dev/D-SFT-dataset__countdown_3arg_setup2_all_struct_majority_correct_all
Viewer
• Updated • 4.2k • 5
TAUR-dev/D-SFT-dataset__commonsenseQA_setup2_all_struct_majority_correct_all
Viewer
• Updated • 21.8k • 3
TAUR-dev/D-EVAL__standard_eval_v3__skills_in_rl_v2__1e5_all_tasks_sft-rl_all_tasks-rl_eval-eval_rl
Viewer
• Updated • 2.45k • 5
TAUR-dev/D-SFT_C-voting_setup1_1epch_1e6_all_tasks_only_sft-sft-data
Viewer
• Updated • 16.9k • 4
TAUR-dev/D-SFT-dataset__longmult_3dig_setup2_all_struct_majority_correct_all
Viewer
• Updated • 11.9k • 4
TAUR-dev/D-SFT-dataset__gsm8k_setup2_all_struct_majority_correct_all
Viewer
• Updated • 17.3k • 4
TAUR-dev/D-SFT_C-voting_setup1_1epch_1e6_cd3arg_only_sft-sft-data
Viewer
• Updated • 1.05k • 5
TAUR-dev/D-SFT-dataset__countdown_3arg_setup3_random_choice_majority_correct_valid_transition
Viewer
• Updated • 1.06k • 5
TAUR-dev/D-SFT-dataset__commonsenseQA_setup3_random_choice_majority_correct_valid_transition
Viewer
• Updated • 6.96k • 4
TAUR-dev/D-SFT-dataset__longmult_3dig_setup3_random_choice_majority_correct_valid_transition
Viewer
• Updated • 3.03k • 5
TAUR-dev/D-SFT-dataset__gsm8k_setup3_random_choice_majority_correct_valid_transition
Viewer
• Updated • 5.89k • 4
TAUR-dev/D-SFT-dataset__countdown_3arg_setup1_random_choice_majority_correct_all
Viewer
• Updated • 1.05k • 5
TAUR-dev/D-SFT-dataset__commonsenseQA_setup1_random_choice_majority_correct_all
Viewer
• Updated • 6.97k • 3
TAUR-dev/D-SFT-dataset__longmult_3dig_setup1_random_choice_majority_correct_all
Viewer
• Updated • 3.03k • 4
TAUR-dev/D-SFT-dataset__gsm8k_setup1_random_choice_majority_correct_all
Viewer
• Updated • 5.88k • 4
TAUR-dev/D-SFT-dataset__countdown_3arg__voting_majority
Viewer
• Updated • 3.53k • 5
TAUR-dev/D-SFT-dataset__countdown_3arg__voting_icc
Viewer
• Updated • 7.08k • 4
TAUR-dev/D-EVAL__standard_eval_v3__temp1-eval_sft
TAUR-dev/D-EVAL__standard_eval_v3__skills_in_rl_v2__1e5_cd3arg_sft-rl_all_tasks-rl_eval-eval_rl
Viewer
• Updated • 2.45k • 3
TAUR-dev/D-EVAL__standard_eval_v3__sft1e-5_ppo_countdown3arg_format0.3_transition0.3-rl_eval-eval_sft
Viewer
• Updated • 2.45k • 5
TAUR-dev/D-EVAL__standard_eval_v3__skills_in_rl_v2__1e6_all_tasks_sft-rl_all_tasks-rl_eval-eval_rl
Viewer
• Updated • 2.45k • 5
TAUR-dev/D-EVAL__standard_eval_v3__sft1e-5_ppo_countdown3arg_transition0.3-rl_eval-eval_sft
Viewer
• Updated • 2.45k • 5
TAUR-dev/D-EVAL__standard_eval_v3__sft1e-6_ppo_countdown3arg_format0.3-rl_eval-eval_sft
Viewer
• Updated • 2.45k • 4
TAUR-dev/D-EVAL__standard_eval_v3__sft1e-6_ppo_countdown3arg_transition0.1-rl_eval-eval_sft
Viewer
• Updated • 2.45k • 4