TAUR-dev/D-EVAL__standard_eval_v3__SIE-mixed_10ep_sft_then_ppo-sft
Viewer
• Updated • 4.9k • 48
TAUR-dev/D-EVAL__standard_eval_v3__SIE-mixed_10ep_sft_then_ppo-rl
Viewer
• Updated • 4.9k • 45
TAUR-dev/D-EVAL__standard_eval_v3__SIE-mixed_1ep_sft_then_ppo-rl
Viewer
• Updated • 4.9k • 50
TAUR-dev/D-EVAL__standard_eval_v3__mask_test_5epoch
Viewer
• Updated • 3.4k • 49
TAUR-dev/D-EVAL__standard_eval_v3__mask_test_1epoch
Viewer
• Updated • 3.4k • 50
TAUR-dev/D-EVAL__standard_eval_v3__SIE-mock_search_v2_first_attempt-sft
Viewer
• Updated • 3.4k • 50
TAUR-dev/D-EVAL__standard_eval_v3__SIE-mock_search_v2_first_attempt__nonmixed_rl-rl
Viewer
• Updated • 3.4k • 50
TAUR-dev/D-EVAL__standard_eval_v3__SIE-mock_search_v2_first_attempt__mixed_rl-rl
Viewer
• Updated • 3.4k • 52
TAUR-dev/answer_parser_gold_standard__gpt4o_annotated
Viewer
• Updated • 1.7k • 52
TAUR-dev/D-EVAL__standard_eval_v1__SIE-mock_search_v2_first_attempt__mixed_rl-rl
Viewer
• Updated • 1.7k • 61
TAUR-dev/D-EVAL__standard_eval_v1__SIE-mock_search_v2_first_attempt__nonmixed_rl-rl
Viewer
• Updated • 1.7k • 70
TAUR-dev/D-EVAL__standard_eval_v1__SIE-mock_search_v2_first_attempt-sft
Viewer
• Updated • 1.7k • 73
TAUR-dev/D-EVAL__standard_eval_v1__mask_test_1epoch
Viewer
• Updated • 1.7k • 75
TAUR-dev/D-EVAL__standard_eval_v1__mask_test_5epoch
Viewer
• Updated • 1.7k • 71
TAUR-dev/D-SFTv1_C-cd3arg-Qwen2.5-1.5B-MockSearchV2-7_24_25-extra_info
Viewer
• Updated • 29.8k • 78
TAUR-dev/testing2_standard_eval_v2
Viewer
• Updated • 300 • 83
TAUR-dev/D-EVAL__standard_eval_v2__SIE-mixed_1ep_sft_then_ppo-rl
Viewer
• Updated • 2.45k • 65
TAUR-dev/D-EVAL__standard_eval_v2__SIE-mixed_10ep_sft_then_ppo-rl
Viewer
• Updated • 2.45k • 63
TAUR-dev/D-EVAL__standard_eval_v2__SIE-mixed_10ep_sft_then_ppo-sft
Viewer
• Updated • 2.45k • 66
TAUR-dev/D-EVAL__standard_eval_v2__SIE-mixed_1ep_sft_then_ppo-sft
Viewer
• Updated • 2.45k • 64
TAUR-dev/D-EVAL__standard_eval_v2__SIE-mix_sft_5ep_1e6lr__ppo_all_tasks_5ep-rl
Viewer
• Updated • 4.9k • 63
TAUR-dev/D-EVAL__standard_eval_v1__skillfactory_longmult2d_data__BON__convos_mask_nonverification_tokens
Viewer
• Updated • 1.7k • 67
TAUR-dev/D-EVAL__standard_eval_v2__SIE-rl_only__ppo__all_tasks__5ep-rl
Viewer
• Updated • 4.9k • 64
TAUR-dev/D-EVAL__standard_eval_v2__SIE-mix_sft_5ep_1e6lr__ppo_all_tasks_1ep-rl
Viewer
• Updated • 4.9k • 63
TAUR-dev/D-EVAL__standard_eval_v2__SIE-mix_sft_5ep_1e6lr__grpo_all_tasks_5ep-rl
Viewer
• Updated • 4.9k • 62
TAUR-dev/D-EVAL__standard_eval_v2__SIE-rl_only__grpo__all_tasks__5ep-rl
Viewer
• Updated • 2.55k • 64
TAUR-dev/D-SFT_C-cd3arg-Qwen2.5-1.5B-Mixed-all_examples_with_skills
Viewer
• Updated • 15.2k • 66
TAUR-dev/D-EVAL__standard_eval_v1__masking_run_1
Viewer
• Updated • 1.7k • 64
TAUR-dev/D-SFTv2_C-cd3arg-Qwen2.5-1.5B-Instruct-AnsRev-think__glue_only
Viewer
• Updated • 6.04k • 66
TAUR-dev/D-ANALYSIS__tmp_mega_eval_dataset__div_metrics_sampled
Viewer
• Updated • 8k • 66