daixuancheng/distill_1.5b_sac-init0.4_constrainbyAdv_global_step_680 Text Generation • 2B • Updated Jun 24, 2025 • 2
daixuancheng/distill_1.5b_sac-init0.4_constrainbyAdv_global_step_660 Text Generation • 2B • Updated Jun 24, 2025 • 2
daixuancheng/distill_1.5b_sac-init0.4_constrainbyAdv_global_step_640 Text Generation • 2B • Updated Jun 24, 2025 • 1
daixuancheng/distill_1.5b_sac-init0.4_constrainbyAdv_global_step_300 Text Generation • 2B • Updated Jun 24, 2025 • 2
daixuancheng/zero_qwen-math-7b_base_allDapo_mathVerify_yesSuffix_step240 Text Generation • 8B • Updated Jun 24, 2025 • 2
daixuancheng/ppo_sample8_critic-warm10-lr2e-6_step160_crtic Text Generation • 8B • Updated Jun 24, 2025 • 2
daixuancheng/zero_qwen-math-7b_base_allDapo_mathVerify_yesSuffix_step200 Text Generation • 8B • Updated Jun 24, 2025 • 2
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-160_critic Text Generation • 8B • Updated Jun 24, 2025 • 2
daixuancheng/zero_7b_base_useTokenLoss_clipHigh_KLcoeff0_step140 Text Generation • 8B • Updated Jun 24, 2025 • 2
daixuancheng/rerun_sac-init0.1_qwen-math-7b_constrainbyAdv_step280 Text Generation • 8B • Updated Jun 24, 2025 • 2
daixuancheng/rerun_qwen-math-7b_noSuffix_base_step240 Text Generation • 8B • Updated Jun 24, 2025 • 2
daixuancheng/ppo_sample8_critic-warm10-lr2e-6_step160_actor Text Generation • 8B • Updated Jun 24, 2025 • 2
daixuancheng/sac-init0.4_qwen-math-7b_constrainbyAdv_yesSuffix_step240 Text Generation • 8B • Updated Jun 24, 2025 • 2
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-160_actor Text Generation • 8B • Updated Jun 24, 2025 • 2
daixuancheng/zero_7b_base_useTokenLoss_clipHigh_KLcoeff0_step200 Text Generation • 8B • Updated Jun 24, 2025 • 3
daixuancheng/ppo_sample8_critic-warm10-lr2e-6_step200_crtic Text Generation • 8B • Updated Jun 24, 2025 • 2
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-200_critic Text Generation • 8B • Updated Jun 24, 2025 • 2
daixuancheng/sac-init0.4_qwen-math-7b_constrainbyAdv_yesSuffix_step200 Text Generation • 8B • Updated Jun 24, 2025 • 2
daixuancheng/zero_7b_base_useTokenLoss_clipHigh_KLcoeff0_step180 Text Generation • 8B • Updated Jun 24, 2025 • 2
daixuancheng/rerun_sac-init0.1_qwen-math-7b_constrainbyAdv_step240 Text Generation • 8B • Updated Jun 24, 2025 • 2
daixuancheng/fix-entropy-1e-3_train_math_global_step_200 Text Generation • 8B • Updated Jun 24, 2025 • 1
daixuancheng/ppo_sample8_critic-warm10-lr2e-6_step200_actor Text Generation • 8B • Updated Jun 24, 2025 • 2