happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_gpg_sig_r_js2_kl_false Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_gpg_sig_r_js1_kl_false Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.05_sig_3_r_js2 Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.05_sig_3_r_js1 Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.05_beta_clip_sig_3_r_js2 Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.05_beta_clip_sig_3_r_js1 Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.03_beta_clip_sig_3_r_js4_kl_false Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.03_beta_clip_sig_3_r_js3_kl_false Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.03_beta_clip_sig_3_r_js2_kl_false Updated Oct 10, 2025
happyfighting/Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.03_beta_clip_sig_3_r_js1_kl_false Updated Oct 10, 2025
happyfighting/verl_logic_kk_Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.03_lin_r_js4 Updated Sep 18, 2025
happyfighting/verl_logic_kk_Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.03_lin_r_js2 Updated Sep 18, 2025
happyfighting/verl_logic_kk_Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_gspo_sig_r_js3 Updated Sep 17, 2025
happyfighting/verl_logic_kk_Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_gspo_sig_r_js2 Updated Sep 17, 2025
happyfighting/verl_logic_kk_Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_gspo_sig_r_js1 Updated Sep 17, 2025