·
AI & ML interests
None yet
Organizations
1231czx/gg_regular_prompt_ppo_format_reward_step240
8B • Updated
• 1
1231czx/gg_regular_prompt_ppo_format_reward_step230
8B • Updated
1231czx/gg_regular_prompt_ppo_format_reward_step220
8B • Updated
• 1
1231czx/gg_regular_prompt_ppo_format_reward_step210
8B • Updated
• 1
1231czx/gg_regular_prompt_ppo_format_reward_step200
8B • Updated
• 1
1231czx/gg_regular_prompt_ppo_format_reward_step190
8B • Updated
1231czx/gg_regular_prompt_ppo_format_reward_step170
8B • Updated
1231czx/gg_regular_prompt_ppo_format_reward_step180
8B • Updated
• 1
1231czx/gg_regular_prompt_ppo_format_reward_step140
8B • Updated
• 1
1231czx/gg_regular_prompt_ppo_format_reward_step160
8B • Updated
• 1
1231czx/gg_regular_prompt_ppo_format_reward_step150
8B • Updated
• 1
1231czx/gg_regular_prompt_ppo_format_reward_step120
8B • Updated
1231czx/gg_regular_prompt_ppo_format_reward_step90
8B • Updated
1231czx/gg_regular_prompt_ppo_format_reward_step70
8B • Updated
1231czx/gg_regular_prompt_ppo_format_reward_step60
8B • Updated
1231czx/gg_regular_prompt_ppo_format_reward_step50
8B • Updated
1231czx/qwen_self_corr_warmup_clean_ep1_new_temp
Text Generation
• 8B • Updated
• 1
1231czx/qwen_self_corr_warmup2_clean_ep1
Text Generation
• 8B • Updated
• 1
1231czx/qw_ppo_self_corr_regular_prompt_step200
Text Generation
• 8B • Updated
1231czx/qw_ppo_self_corr_regular_prompt_step190
Text Generation
• 8B • Updated
1231czx/qw_ppo_self_corr_regular_prompt_step170
Text Generation
• 8B • Updated
1231czx/qw_ppo_self_corr_regular_prompt_step80
Text Generation
• 8B • Updated
• 1
1231czx/qw_ppo_self_corr_regular_prompt_step60
Text Generation
• 8B • Updated
1231czx/qw_ppo_self_corr_regular_prompt_step40
Text Generation
• 8B • Updated
1231czx/qw_ppo_self_corr_regular_prompt_step20
Text Generation
• 8B • Updated
1231czx/qw_ppo_self_corr_regular_prompt_step150
Text Generation
• 8B • Updated
1231czx/qw_ppo_self_corr_regular_prompt_step130
Text Generation
• 8B • Updated
1231czx/qw_ppo_self_corr_regular_prompt_step100
Text Generation
• 8B • Updated
1231czx/qwen_self_corr_warmup_packed_2ep_regular_prompt
Text Generation
• 8B • Updated
• 1
1231czx/kl001_numia_dpo_iter6
Text Generation
• 8B • Updated