Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step440 2B • Updated Feb 9
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step400 2B • Updated Feb 9
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step360 2B • Updated Feb 9
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step320 2B • Updated Feb 9
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step280 2B • Updated Feb 9
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step240 2B • Updated Feb 9
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step200 2B • Updated Feb 9
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step160 2B • Updated Feb 9
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step120 2B • Updated Feb 9
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step80 2B • Updated Feb 9
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step40 Updated Feb 9
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_batch_data Updated Feb 9
Chenlu123/inistd1e-5_clip_0.2_0.26_c10.0_lr5e-4_qwen_qwen2_5_math_1_5b_n8_layers_all_step720 2B • Updated Feb 4
Chenlu123/inistd1e-5_clip_0.2_0.26_c10.0_lr5e-4_qwen_qwen2_5_math_1_5b_n8_layers_all_step700 2B • Updated Feb 4
Chenlu123/inistd1e-5_clip_0.2_0.26_c10.0_lr5e-4_qwen_qwen2_5_math_1_5b_n8_layers_all_step680 2B • Updated Feb 4
Chenlu123/inistd1e-5_clip_0.2_0.26_c10.0_lr5e-4_qwen_qwen2_5_math_1_5b_n8_layers_all_step660 2B • Updated Feb 4
Chenlu123/inistd1e-5_clip_0.2_0.26_c10.0_lr5e-4_qwen_qwen2_5_math_1_5b_n8_layers_all_step640 2B • Updated Feb 4
Chenlu123/inistd1e-5_clip_0.2_0.26_c10.0_lr5e-4_qwen_qwen2_5_math_1_5b_n8_layers_all_step620 2B • Updated Feb 4
Chenlu123/inistd1e-5_clip_0.2_0.26_c10.0_lr5e-4_qwen_qwen2_5_math_1_5b_n8_layers_all_step600 2B • Updated Feb 4
Chenlu123/inistd1e-5_clip_0.2_0.26_c10.0_lr5e-4_qwen_qwen2_5_math_1_5b_n8_layers_all_step580 2B • Updated Feb 4
Chenlu123/inistd1e-5_clip_0.2_0.26_c10.0_lr5e-4_qwen_qwen2_5_math_1_5b_n8_layers_all_step560 2B • Updated Feb 4
Chenlu123/inistd1e-5_clip_0.2_0.26_c10.0_lr5e-4_qwen_qwen2_5_math_1_5b_n8_layers_all_step540 2B • Updated Feb 4
Chenlu123/inistd1e-5_clip_0.2_0.26_c10.0_lr5e-4_qwen_qwen2_5_math_1_5b_n8_layers_all_step520 2B • Updated Feb 4
Chenlu123/inistd1e-5_clip_0.2_0.26_c10.0_lr5e-4_qwen_qwen2_5_math_1_5b_n8_layers_all_step500 2B • Updated Feb 4
Chenlu123/inistd1e-5_clip_0.2_0.26_c10.0_lr5e-4_qwen_qwen2_5_math_1_5b_n8_layers_all_step480 2B • Updated Feb 4
Chenlu123/inistd1e-5_clip_0.2_0.26_c10.0_lr5e-4_qwen_qwen2_5_math_1_5b_n8_layers_all_step460 2B • Updated Feb 4
Chenlu123/inistd1e-5_clip_0.2_0.26_c10.0_lr5e-4_qwen_qwen2_5_math_1_5b_n8_layers_all_step440 2B • Updated Feb 4
Chenlu123/inistd1e-5_clip_0.2_0.26_c10.0_lr5e-4_qwen_qwen2_5_math_1_5b_n8_layers_all_step420 2B • Updated Feb 4
Chenlu123/inistd1e-5_clip_0.2_0.26_c10.0_lr5e-4_qwen_qwen2_5_math_1_5b_n8_layers_all_step400 2B • Updated Feb 4
Chenlu123/inistd1e-5_clip_0.2_0.26_c10.0_lr5e-4_qwen_qwen2_5_math_1_5b_n8_layers_all_step380 2B • Updated Feb 4