0.5B • Updated • 3
TAUR-dev/testing_llamafactory_helper_quick_test1
0.5B • Updated • 1
TAUR-dev/qwen25_vl_7b_element_lookup_01format_09_coordinate_02reflect_thrsh20_no_feedback_10_20
Updated
TAUR-dev/M-rl_ours_AT_fixed-rl
Updated
TAUR-dev/M-sft_exp_AT_pvv2__fixed-sft
2B • Updated • 2
TAUR-dev/M-rl_1e_v2__pv__4ominireflections-rl
2B • Updated • 1
TAUR-dev/M-sft_exp_pvv2__gpt4ominiref-sft
2B • Updated • 2
TAUR-dev/M-rl_1e_v2__pv_v2__32k-rl
2B • Updated • 1
TAUR-dev/M-rl_rlonly__32k-rl
Updated
TAUR-dev/M-R1_distilled_baseline_cd3args_only
2B • Updated • 2
TAUR-dev/M-0921__zayne1_alltask1_grpo_resume-rl
Updated
TAUR-dev/M-0921__zayne1_alltask2_grpo_resume-rl
Updated
TAUR-dev/M-test_scratch-sft
1B • Updated • 2
TAUR-dev/M-bolt_gpt4o_baseline-rl
2B • Updated • 2
TAUR-dev/M-0921__pv2_CT3and4arg_grpo-rl
2B • Updated • 2
TAUR-dev/M-0921__0epoch_CT3and4arg_grpo-rl
2B • Updated • 1
TAUR-dev/M-BASELINE_gtp4o_distillation-sft
2B • Updated • 1
TAUR-dev/M-BASELINE_gtp4o_BOLT-sft
2B • Updated • 2
TAUR-dev/M-multitask_sftdata_cd3_lm3_ac4_lc4-sft
2B • Updated • 1
TAUR-dev/M-multitask_sftdata_cd34_lm3_ac4_lc4-sft
2B • Updated • 2
TAUR-dev/M-0918__bon_tuning_correct_samples_3args_grpo-rl
2B • Updated • 2
TAUR-dev/M-0918__bon_tuning_all_samples_3args_grpo-rl
2B • Updated • 3
TAUR-dev/M-0918__orig_only_prompts_3args_grpo-rl
2B • Updated • 1
TAUR-dev/M-ablations__rl_ab_no_reflects-rl
2B • Updated • 3
TAUR-dev/M-0918__random_3args_grpo-rl
2B • Updated • 1
TAUR-dev/M-0918__1_sample_only_corrects_3args_grpo-rl
2B • Updated • 1
TAUR-dev/M-sft_on_pv_v2__rl_on_cd34_gsm_csqa_lm34-rl
Updated
TAUR-dev/M-sft_basemodel__rl_on_cd34_gsm_csqa_lm34-rl
Updated
TAUR-dev/M-0918__low_quality_reflections_3args_grpo-rl
Updated
TAUR-dev/M-RC-ab_sft_bon_all_samples-sft
2B • Updated • 2