wgcyeo/ci-grpo_Llama-3.1-8B-Instruct_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated Apr 2 • 3
wgcyeo/ci-grpo_Olmo-3-7B-Instruct_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated Apr 1 • 5
wgcyeo/ci-feedback_both_ema_Olmo-3-7B-Instruct_reverse_kl_ema0p999_ep30 Text Generation • Updated Apr 1 • 1
wgcyeo/ci-feedback_allowed_ema_Olmo-3-7B-Instruct_reverse_kl_ema0p999_ep30 Text Generation • Updated Apr 1 • 1
wgcyeo/ci-feedback_weighted_asym_bi_kl_fixed_ema_Olmo-3-7B-Instruct_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated Apr 1 • 7
wgcyeo/ci-feedback_disallowed_ema_Olmo-3-7B-Instruct_reverse_kl_ema0p999_ep30 Text Generation • Updated Apr 1